Prefix Aware Scorer

**What would you like to be added**:

I would like to propose the introduction of a new prefix aware scorer. This new scorer will assign a numerical value to a target pod part of an inference pool based on prefix matching. This scorer leverages historical prompt patterns to route requests to pods that have previously handled similar prompt segments.

The scorer keeps track of the prefixes using an in-memory store based on a fast hashing algorithm (xxHash) and a LRU (Least Recently Used) eviction policy to avoid uncontrolled memory consumption.

Despite its similarities with proposal #602 / #768 , this scorer is a self-contained plugin that can be enabled or utilised in conjunction with other scorers, and has no impact on the internal structure of the EPP/scheduler.

**Why is this needed**:

This scorer can improve cache hits and efficiency without depending on the availability of a distributed KV-cache index, being lightweight and self-contained. It does not guarantee that the pod has the exact cached context for the current request, so it can be considered a best-effort, opportunistic, heuristic-based approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefix Aware Scorer #801

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prefix Aware Scorer #801

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions