Fix: Correct num_cached_tokens counting logic in BlockManager.allocate #110

AmaZzzingLHQ · 2025-10-06T15:00:31Z

File Modified: engine/block_manager.py - Adjusted the timing of seq.num_cached_tokens increment in the allocate method

Reason for Modification:
In the original logic, num_cached_tokens was incremented directly when cache_miss=False (hash hit and token match), but it failed to account for potential stale entries in the hash_to_block_id map. Specifically, a block ID might exist in the hash map but have already been released (no longer in used_block_ids) because its reference count dropped to 0. In such cases, _allocate_block would be triggered to reinitialize the block—this is essentially a "new allocation" rather than "cache reuse," yet it was incorrectly counted as cached tokens.

Fix Logic:
Moved seq.num_cached_tokens += self.block_size into the if block_id in self.used_block_ids: branch. This ensures num_cached_tokens is only incremented when the block is actually in use (not released) and reused, fully aligning with its intended semantic of "counting truly reused cached tokens."

Impact:
Fixes the statistical error in cache utilization, making num_cached_tokens accurately reflect real cache reuse (facilitating subsequent performance analysis). Does not affect the core logic of KV cache allocation/reuse; no functional risks.

File Modified: engine/block_manager.py - Adjusted the timing of seq.num_cached_tokens increment in the allocate method Reason for Modification:In the original logic, num_cached_tokens was incremented directly when cache_miss=False (hash hit and token match), but it failed to account for potential stale entries in the hash_to_block_id map. Specifically, a block ID might exist in the hash map but have already been released (no longer in used_block_ids) because its reference count dropped to 0. In such cases, _allocate_block would be triggered to reinitialize the block—this is essentially a "new allocation" rather than "cache reuse," yet it was incorrectly counted as cached tokens. Fix Logic:Moved seq.num_cached_tokens += self.block_size into the if block_id in self.used_block_ids: branch. This ensures num_cached_tokens is only incremented when the block is actually in use (not released) and reused, fully aligning with its intended semantic of "counting truly reused cached tokens." Impact: Fixes the statistical error in cache utilization, making num_cached_tokens accurately reflect real cache reuse (facilitating subsequent performance analysis). Does not affect the core logic of KV cache allocation/reuse; no functional risks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Correct num_cached_tokens counting logic in BlockManager.allocate #110

Fix: Correct num_cached_tokens counting logic in BlockManager.allocate #110

Uh oh!

AmaZzzingLHQ commented Oct 6, 2025

Uh oh!

Uh oh!

Fix: Correct num_cached_tokens counting logic in BlockManager.allocate #110

Are you sure you want to change the base?

Fix: Correct num_cached_tokens counting logic in BlockManager.allocate #110

Uh oh!

Conversation

AmaZzzingLHQ commented Oct 6, 2025

Uh oh!

Uh oh!