-
Notifications
You must be signed in to change notification settings - Fork 76
Prefix Aware Scorer #801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you elaborate on how this is different than the implementation in #768 ? |
Hey @liu-cong , I see that your PR has been merged on Friday, so I'm not really sure if it makes sense to discuss how my scorer proposal differs in implementation. Maybe I missed the goal of your proposal, but looking at the file structure, it didn't make sense to have a scorer just under the scheduling/plugins folder, and not just under scorers. How can a user identifies this is a scorer? |
Just to clarify, the scheduler framework supports multiple extension points (PreSchedule, Filter, Score, PostSchedule, etc.). A plugin implementation can implement multiple extension points. The prefix plugin implements Today the To be clean, we can have each plugin have their own package, and name the package based on the purpose of the plugin, e.g., |
The #768 PR implements this proposal https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0602-prefix-cache-aware-routing-proposal |
big +1. |
What would you like to be added:
I would like to propose the introduction of a new prefix aware scorer. This new scorer will assign a numerical value to a target pod part of an inference pool based on prefix matching. This scorer leverages historical prompt patterns to route requests to pods that have previously handled similar prompt segments.
The scorer keeps track of the prefixes using an in-memory store based on a fast hashing algorithm (xxHash) and a LRU (Least Recently Used) eviction policy to avoid uncontrolled memory consumption.
Despite its similarities with proposal #602 / #768 , this scorer is a self-contained plugin that can be enabled or utilised in conjunction with other scorers, and has no impact on the internal structure of the EPP/scheduler.
Why is this needed:
This scorer can improve cache hits and efficiency without depending on the availability of a distributed KV-cache index, being lightweight and self-contained. It does not guarantee that the pod has the exact cached context for the current request, so it can be considered a best-effort, opportunistic, heuristic-based approach.
The text was updated successfully, but these errors were encountered: