-
Notifications
You must be signed in to change notification settings - Fork 323
Description
Describe the feature
Problem
The transcription endpoint filtering logic is broken in K8S service discovery mode due to inconsistent labeling between the Helm chart and service discovery.
Root cause
- Transcription filtering (
src/vllm_router/services/request_service/request.py:594) expectsep.model_label == "transcription" - Service discovery (
src/vllm_router/service_discovery.py:619) getsmodel_labelfrom pod label"model" - Helm chart (
helm/templates/deployment-vllm-multi.yaml:42) automatically setsmodel: {{ $modelSpec.name }}(e.g.,
"whisper-small")
Result
Transcription requests always fail with "No transcription backend available" because no endpoints ever have model_label == "transcription" - they all have the actual model name instead.
Suggested
I see two ways to solve this problem.
Option 1
Add model label override to the model specification in the HelmChart.
I'm not sure about this one. Is this label used for any other purpose?
Option 2
Add a task parameter to the model specification in Helm values, then:
- Add
task: {{ $modelSpec.task }}label on engine pod in the Helm chart - Update service discovery to read the
tasklabel instead of/alongside themodellabel - Filter transcription endpoints using
ep.task == "transcription"or similar logic
Why do you need this feature?
I’m running a Kubernetes cluster that hosts multiple distinct models behind a single router.
This router is then shared between several AWS accounts via a single ServiceEndpoint.
The list of models and their numbers varies frequently for R&D reasons.
For this reason, k8s discovery is really useful.
Additional context
No response