Skip to content

feature: Add support for transcription routing in Kubernetes service discovery mode. #710

@BriceMichalski

Description

@BriceMichalski

Describe the feature

Problem

The transcription endpoint filtering logic is broken in K8S service discovery mode due to inconsistent labeling between the Helm chart and service discovery.

Root cause

  1. Transcription filtering (src/vllm_router/services/request_service/request.py:594) expects ep.model_label == "transcription"
  2. Service discovery (src/vllm_router/service_discovery.py:619) gets model_label from pod label "model"
  3. Helm chart (helm/templates/deployment-vllm-multi.yaml:42) automatically sets model: {{ $modelSpec.name }} (e.g.,
    "whisper-small")

Result

Transcription requests always fail with "No transcription backend available" because no endpoints ever have model_label == "transcription" - they all have the actual model name instead.

Suggested

I see two ways to solve this problem.

Option 1

Add model label override to the model specification in the HelmChart.
I'm not sure about this one. Is this label used for any other purpose?

Option 2

Add a task parameter to the model specification in Helm values, then:

  1. Add task: {{ $modelSpec.task }} label on engine pod in the Helm chart
  2. Update service discovery to read the task label instead of/alongside the model label
  3. Filter transcription endpoints using ep.task == "transcription" or similar logic

Why do you need this feature?

I’m running a Kubernetes cluster that hosts multiple distinct models behind a single router.
This router is then shared between several AWS accounts via a single ServiceEndpoint.
The list of models and their numbers varies frequently for R&D reasons.
For this reason, k8s discovery is really useful.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions