Skip to content

Conversation

AmaZzzingLHQ
Copy link

File Modified: engine/block_manager.py - Adjusted the timing of seq.num_cached_tokens increment in the allocate method

Reason for Modification:
In the original logic, num_cached_tokens was incremented directly when cache_miss=False (hash hit and token match), but it failed to account for potential stale entries in the hash_to_block_id map. Specifically, a block ID might exist in the hash map but have already been released (no longer in used_block_ids) because its reference count dropped to 0. In such cases, _allocate_block would be triggered to reinitialize the block—this is essentially a "new allocation" rather than "cache reuse," yet it was incorrectly counted as cached tokens.

Fix Logic:
Moved seq.num_cached_tokens += self.block_size into the if block_id in self.used_block_ids: branch. This ensures num_cached_tokens is only incremented when the block is actually in use (not released) and reused, fully aligning with its intended semantic of "counting truly reused cached tokens."

Impact:
Fixes the statistical error in cache utilization, making num_cached_tokens accurately reflect real cache reuse (facilitating subsequent performance analysis). Does not affect the core logic of KV cache allocation/reuse; no functional risks.

File Modified: engine/block_manager.py - Adjusted the timing of seq.num_cached_tokens increment in the allocate method
Reason for Modification:In the original logic, num_cached_tokens was incremented directly when cache_miss=False (hash hit and token match), but it failed to account for potential stale entries in the hash_to_block_id map. Specifically, a block ID might exist in the hash map but have already been released (no longer in used_block_ids) because its reference count dropped to 0. In such cases, _allocate_block would be triggered to reinitialize the block—this is essentially a "new allocation" rather than "cache reuse," yet it was incorrectly counted as cached tokens.
Fix Logic:Moved seq.num_cached_tokens += self.block_size into the if block_id in self.used_block_ids: branch. This ensures num_cached_tokens is only incremented when the block is actually in use (not released) and reused, fully aligning with its intended semantic of "counting truly reused cached tokens."
Impact:
Fixes the statistical error in cache utilization, making num_cached_tokens accurately reflect real cache reuse (facilitating subsequent performance analysis).
Does not affect the core logic of KV cache allocation/reuse; no functional risks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant