Skip to content

Prefix Aware Scorer #801

Open
Open
@oglok

Description

@oglok

What would you like to be added:

I would like to propose the introduction of a new prefix aware scorer. This new scorer will assign a numerical value to a target pod part of an inference pool based on prefix matching. This scorer leverages historical prompt patterns to route requests to pods that have previously handled similar prompt segments.

The scorer keeps track of the prefixes using an in-memory store based on a fast hashing algorithm (xxHash) and a LRU (Least Recently Used) eviction policy to avoid uncontrolled memory consumption.

Despite its similarities with proposal #602 / #768 , this scorer is a self-contained plugin that can be enabled or utilised in conjunction with other scorers, and has no impact on the internal structure of the EPP/scheduler.

Why is this needed:

This scorer can improve cache hits and efficiency without depending on the availability of a distributed KV-cache index, being lightweight and self-contained. It does not guarantee that the pod has the exact cached context for the current request, so it can be considered a best-effort, opportunistic, heuristic-based approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions