Skip to content

Prefix Aware Scorer #801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oglok opened this issue May 8, 2025 · 5 comments
Open

Prefix Aware Scorer #801

oglok opened this issue May 8, 2025 · 5 comments
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@oglok
Copy link

oglok commented May 8, 2025

What would you like to be added:

I would like to propose the introduction of a new prefix aware scorer. This new scorer will assign a numerical value to a target pod part of an inference pool based on prefix matching. This scorer leverages historical prompt patterns to route requests to pods that have previously handled similar prompt segments.

The scorer keeps track of the prefixes using an in-memory store based on a fast hashing algorithm (xxHash) and a LRU (Least Recently Used) eviction policy to avoid uncontrolled memory consumption.

Despite its similarities with proposal #602 / #768 , this scorer is a self-contained plugin that can be enabled or utilised in conjunction with other scorers, and has no impact on the internal structure of the EPP/scheduler.

Why is this needed:

This scorer can improve cache hits and efficiency without depending on the availability of a distributed KV-cache index, being lightweight and self-contained. It does not guarantee that the pod has the exact cached context for the current request, so it can be considered a best-effort, opportunistic, heuristic-based approach.

@oglok oglok added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 8, 2025
@liu-cong
Copy link
Contributor

liu-cong commented May 8, 2025

Can you elaborate on how this is different than the implementation in #768 ?

@oglok
Copy link
Author

oglok commented May 12, 2025

Hey @liu-cong ,

I see that your PR has been merged on Friday, so I'm not really sure if it makes sense to discuss how my scorer proposal differs in implementation. Maybe I missed the goal of your proposal, but looking at the file structure, it didn't make sense to have a scorer just under the scheduling/plugins folder, and not just under scorers. How can a user identifies this is a scorer?

@liu-cong
Copy link
Contributor

liu-cong commented May 12, 2025

it didn't make sense to have a scorer just under the scheduling/plugins folder

Just to clarify, the scheduler framework supports multiple extension points (PreSchedule, Filter, Score, PostSchedule, etc.). A plugin implementation can implement multiple extension points. The prefix plugin implements PreSchedule, Score and PostSchedule.

Today the scorer package holds simple plugins that only implement the scorer interface, same as picker and filter. However this approach doesn't work if a plugin implements multiple extension interfaces.

To be clean, we can have each plugin have their own package, and name the package based on the purpose of the plugin, e.g., prefix_cache, queue, lora_affinity. This is how kube-scheduler manages its plugins. I'd like to hear @nirrozenbaum 's opinion on this proposal.

@liu-cong
Copy link
Contributor

@nirrozenbaum
Copy link
Contributor

To be clean, we can have each plugin have their own package, and name the package based on the purpose of the plugin, e.g., prefix_cache, queue, lora_affinity. This is how kube-scheduler manages its plugins. I'd like to hear @nirrozenbaum 's opinion on this proposal.

big +1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

3 participants