Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .github/workflows/build-custom-router.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Build Custom Router Image

on:
push:
branches:
- fix-max-model-len
- main
workflow_dispatch:

jobs:
build:
permissions:
contents: read
packages: write
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

# Login to GitHub Container Registry (GHCR)
- name: Login to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push image
uses: docker/build-push-action@v5
with:
context: .
file: docker/Dockerfile
push: true
tags: |
ghcr.io/${{ github.repository }}/router:latest
ghcr.io/${{ github.repository }}/router:max-model-len-fix
ghcr.io/${{ github.repository }}/router:${{ github.sha }}
cache-from: type=registry,ref=ghcr.io/${{ github.repository }}/router:buildcache
cache-to: type=registry,ref=ghcr.io/${{ github.repository }}/router:buildcache,mode=max
1 change: 1 addition & 0 deletions src/vllm_router/protocols.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ class ModelCard(OpenAIBaseModel):
owned_by: str = "vllm"
root: Optional[str] = None
parent: Optional[str] = None
max_model_len: Optional[int] = None


class ModelList(OpenAIBaseModel):
Expand Down
1 change: 1 addition & 0 deletions src/vllm_router/routers/main_router.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ async def show_models():
created=model_info.created,
owned_by=model_info.owned_by,
parent=model_info.parent,
max_model_len=model_info.max_model_len,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change correctly passes the max_model_len to the ModelCard. To ensure this functionality is robust and to prevent future regressions, it would be beneficial to add a unit test for the /v1/models endpoint. The test should verify that when a model's ModelInfo includes a max_model_len, this value is correctly included in the API response.

)
model_cards.append(model_card)
existing_models.add(model_id)
Expand Down
3 changes: 3 additions & 0 deletions src/vllm_router/service_discovery.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ class ModelInfo:
root: Optional[str] = None
parent: Optional[str] = None
is_adapter: bool = False
max_model_len: Optional[int] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While max_model_len is correctly added to ModelInfo and populated for Kubernetes-based service discovery, StaticServiceDiscovery currently has no mechanism to configure this value; it will always default to None. To ensure feature parity across discovery methods, consider enhancing StaticServiceDiscovery to allow static configuration of max_model_len for each model. This could be done by adding a max_model_lens list to its constructor, similar to how urls and models are handled.


@classmethod
def from_dict(cls, data: Dict) -> "ModelInfo":
Expand All @@ -62,6 +63,7 @@ def from_dict(cls, data: Dict) -> "ModelInfo":
root=data.get("root", None),
parent=data.get("parent", None),
is_adapter=data.get("parent") is not None,
max_model_len=data.get("max_model_len", None),
)

def to_dict(self) -> Dict:
Expand All @@ -74,6 +76,7 @@ def to_dict(self) -> Dict:
"root": self.root,
"parent": self.parent,
"is_adapter": self.is_adapter,
"max_model_len": self.max_model_len,
}


Expand Down
Loading