[Router] Introduction of /v1/responses endpoint #691

sebastiaanvduijn · 2025-09-09T18:59:19Z

[Router][Feat] Add support for /v1/responses endpoint

This PR adds initial support for the OpenAI-compatible /v1/responses endpoint in vllm_router. The implementation mirrors the existing /v1/chat/completions route structure and delegates all requests to route_general_request, ensuring consistent routing, logging, service discovery, and observability.

Key Changes

Modified: routers/main_router.py
- Added route_v1_responses for POST /v1/responses.
- Uses route_general_request to forward payloads to backend.

Motivation

The OpenAI Responses API is becoming a standard. Adding /v1/responses improves interoperability and allows clients that depend on the new endpoint to work out of the box with vLLM’s router.

Testing

Verified the new route is registered and accessible.
Confirmed request forwarding matches existing endpoints (e.g. /v1/chat/completions).

FIX #690

gemini-code-assist

Summary of Changes

Hello @sebastiaanvduijn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new OpenAI-compatible /v1/responses endpoint to the vllm_router. This enhancement aims to improve interoperability with clients that rely on this emerging standard, ensuring seamless integration and consistent request handling through the existing routing infrastructure.

Highlights

New API Endpoint: Introduced the /v1/responses endpoint, mirroring the existing /v1/chat/completions structure and delegating requests to route_general_request for consistent handling.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2025-09-09T19:07:12Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

YuhanLiu11 · 2025-09-09T20:32:24Z

@sebastiaanvduijn can you fix the DCO error?

sebastiaanvduijn · 2025-09-09T21:32:05Z

@sebastiaanvduijn can you fix the DCO error?

should be done now, sorry!

sebastiaanvduijn · 2025-09-09T21:32:34Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new /v1/responses endpoint, mirroring the structure of existing endpoints like /v1/chat/completions. The change is straightforward and correct. I've provided a couple of suggestions to improve code style and maintainability, specifically regarding excessive blank lines, adding a docstring for the new route, and ensuring the file ends with a newline.

src/vllm_router/routers/main_router.py

gemini-code-assist · 2025-09-09T21:33:32Z

src/vllm_router/routers/main_router.py

+@main_router.post("/v1/responses")
+async def route_v1_responses(request: Request, background_tasks: BackgroundTasks):
+    return await route_general_request(request, "/v1/responses", background_tasks)


This new route is missing a docstring. For consistency with other endpoints like /v1/audio/transcriptions, please add one to explain its purpose. Additionally, the file is missing a newline at the end, which is a standard convention that should be followed.

Suggested change

@main_router.post("/v1/responses")

async def route_v1_responses(request: Request, background_tasks: BackgroundTasks):

return await route_general_request(request, "/v1/responses", background_tasks)

@main_router.post("/v1/responses")

async def route_v1_responses(request: Request, background_tasks: BackgroundTasks):

"""Handles OpenAI-compatible /v1/responses requests."""

return await route_general_request(request, "/v1/responses", background_tasks)

max-wittig · 2025-09-19T09:16:21Z

Not sure about this one. Responses is supposed to handle more than just chat completions. Maybe it would be better placed inside vllm itself.

sebastiaanvduijn · 2025-09-19T09:21:25Z

Not sure about this one. Responses is supposed to handle more than just chat completions. Maybe it would be better placed inside vllm itself.

It is already available in VLLM, we just need this endpoint to route it correctly in the router as the endpoint is currently unknown.

max-wittig · 2025-09-19T09:21:33Z

In addition, the API is different.

Responses

{
    "model": "gpt-4.1",
    "input": "Tell me a three sentence bedtime story about a unicorn."
  }

Chat completions

{
  "model": "mistral-nemo-instruct-2407",
  "messages": [
    {
      "role": "system",
      "content": "Say Hello"
    }
  ],
  "temperature": 0.0
}

See also: https://platform.openai.com/docs/guides/migrate-to-responses

bufferoverflow · 2025-09-28T08:15:14Z

btw. first part is in: vllm-project/vllm#20504

sebastiaanvduijn · 2025-09-29T10:00:34Z

Sorry - I might misunderstand this but my PR is related to making the response API available in the router to forward the request to VLLM - So I don't understand the suggestion of leaving this inside VLLM, its already there but not available to route. my code is just picking up the request and forwarding it to the VLLM backend.

peekxc · 2025-11-11T16:26:23Z

Which of the features listed on this migration guide are meant to be covered?

sebastiaanvduijn · 2025-11-13T16:10:53Z

Which of the features listed on this migration guide are meant to be covered?

This endpoint addition is just forwarding the request to the backend VLLM instances, so I would say it covers whatever is implemented on the VLLM side.

Signed-off-by: Sebastiaan van Duijn <[email protected]>

sebastiaanvduijn · 2025-11-13T16:48:04Z

@YuhanLiu11 Fixed all the CI/CD pipeline issues and removed previous commits for clean merge

sebastiaanvduijn · 2025-11-21T16:32:23Z

Hi, @YuhanLiu11 @Shaoting-Feng @ApostaC is there anything blocking approving this PR. I do believe this would help a lot as more and more projects are moving to responses API and in the production stack this is not working due tot he router not supporting the requests.

Thanks a lot for the feedback!

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

sebastiaanvduijn force-pushed the main branch from 406c777 to 510bb95 Compare September 9, 2025 21:28

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

sebastiaanvduijn force-pushed the main branch 4 times, most recently from 209ea5c to 9007cfb Compare September 10, 2025 09:47

peekxc mentioned this pull request Nov 11, 2025

feature: expanding the OpenAI API for richer functionalities #278

Open

sebastiaanvduijn closed this Nov 13, 2025

sebastiaanvduijn force-pushed the main branch from d77c207 to f2d3d71 Compare November 13, 2025 16:41

Add responses endpoint for router

e33f0da

Signed-off-by: Sebastiaan van Duijn <[email protected]>

sebastiaanvduijn reopened this Nov 13, 2025

sebastiaanvduijn changed the title ~~Introduction of /v1/responses endpoint~~ [Router] Introduction of /v1/responses endpoint Nov 13, 2025

[Router] Introduction of /v1/responses endpoint #691

Are you sure you want to change the base?

[Router] Introduction of /v1/responses endpoint #691

Uh oh!

Conversation

sebastiaanvduijn commented Sep 9, 2025

Key Changes

Motivation

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Sep 9, 2025

Uh oh!

YuhanLiu11 commented Sep 9, 2025

Uh oh!

sebastiaanvduijn commented Sep 9, 2025

Uh oh!

sebastiaanvduijn commented Sep 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

max-wittig commented Sep 19, 2025

Uh oh!

sebastiaanvduijn commented Sep 19, 2025

Uh oh!

max-wittig commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Responses

Chat completions

Uh oh!

bufferoverflow commented Sep 28, 2025

Uh oh!

sebastiaanvduijn commented Sep 29, 2025

Uh oh!

peekxc commented Nov 11, 2025

Uh oh!

sebastiaanvduijn commented Nov 13, 2025

Uh oh!

sebastiaanvduijn commented Nov 13, 2025

Uh oh!

sebastiaanvduijn commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

max-wittig commented Sep 19, 2025 •

edited

Loading