Skip to content

Conversation

jerrychenhf
Copy link

@jerrychenhf jerrychenhf commented Sep 17, 2025

Purpose

It is a complex problem to support general chunked prefill at the current code base considering prefill and decode mix. It is also challenge to make it performant. While this can be simplified for P/D case if our target is to support chunked prefill for prefill instance. All the chunked requests are prefill and we can handle it in the same concept of prefix cache with minimum changes.

The most changes happens in KV cache transfer side. For chunked prefill, the time to send the KV cache and the way to fetch data from the cache have been changed.

  1. We send the KV cache only at the last chunk
  2. We need to send the KV cache for the whole sequence instead of KV for the current query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant