-
Notifications
You must be signed in to change notification settings - Fork 323
[Router] Introduction of /v1/responses endpoint #691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @sebastiaanvduijn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a new OpenAI-compatible /v1/responses endpoint to the vllm_router. This enhancement aims to improve interoperability with clients that rely on this emerging standard, ensuring seamless integration and consistent request handling through the existing routing infrastructure.
Highlights
- New API Endpoint: Introduced the /v1/responses endpoint, mirroring the existing /v1/chat/completions structure and delegating requests to route_general_request for consistent handling.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
@sebastiaanvduijn can you fix the DCO error? |
406c777 to
510bb95
Compare
should be done now, sorry! |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new /v1/responses endpoint, mirroring the structure of existing endpoints like /v1/chat/completions. The change is straightforward and correct. I've provided a couple of suggestions to improve code style and maintainability, specifically regarding excessive blank lines, adding a docstring for the new route, and ensuring the file ends with a newline.
| @main_router.post("/v1/responses") | ||
| async def route_v1_responses(request: Request, background_tasks: BackgroundTasks): | ||
| return await route_general_request(request, "/v1/responses", background_tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new route is missing a docstring. For consistency with other endpoints like /v1/audio/transcriptions, please add one to explain its purpose. Additionally, the file is missing a newline at the end, which is a standard convention that should be followed.
| @main_router.post("/v1/responses") | |
| async def route_v1_responses(request: Request, background_tasks: BackgroundTasks): | |
| return await route_general_request(request, "/v1/responses", background_tasks) | |
| @main_router.post("/v1/responses") | |
| async def route_v1_responses(request: Request, background_tasks: BackgroundTasks): | |
| """Handles OpenAI-compatible /v1/responses requests.""" | |
| return await route_general_request(request, "/v1/responses", background_tasks) | |
209ea5c to
9007cfb
Compare
|
Not sure about this one. Responses is supposed to handle more than just chat completions. Maybe it would be better placed inside vllm itself. |
It is already available in VLLM, we just need this endpoint to route it correctly in the router as the endpoint is currently unknown. |
|
In addition, the API is different. Responses{
"model": "gpt-4.1",
"input": "Tell me a three sentence bedtime story about a unicorn."
}Chat completions{
"model": "mistral-nemo-instruct-2407",
"messages": [
{
"role": "system",
"content": "Say Hello"
}
],
"temperature": 0.0
}See also: https://platform.openai.com/docs/guides/migrate-to-responses |
|
btw. first part is in: vllm-project/vllm#20504 |
|
Sorry - I might misunderstand this but my PR is related to making the response API available in the router to forward the request to VLLM - So I don't understand the suggestion of leaving this inside VLLM, its already there but not available to route. my code is just picking up the request and forwarding it to the VLLM backend. |
|
Which of the features listed on this migration guide are meant to be covered? |
This endpoint addition is just forwarding the request to the backend VLLM instances, so I would say it covers whatever is implemented on the VLLM side. |
d77c207 to
f2d3d71
Compare
Signed-off-by: Sebastiaan van Duijn <[email protected]>
|
@YuhanLiu11 Fixed all the CI/CD pipeline issues and removed previous commits for clean merge |
|
Hi, @YuhanLiu11 @Shaoting-Feng @ApostaC is there anything blocking approving this PR. I do believe this would help a lot as more and more projects are moving to responses API and in the production stack this is not working due tot he router not supporting the requests. Thanks a lot for the feedback! |
[Router][Feat] Add support for /v1/responses endpoint
This PR adds initial support for the OpenAI-compatible
/v1/responsesendpoint invllm_router. The implementation mirrors the existing/v1/chat/completionsroute structure and delegates all requests toroute_general_request, ensuring consistent routing, logging, service discovery, and observability.Key Changes
routers/main_router.pyroute_v1_responsesforPOST /v1/responses.route_general_requestto forward payloads to backend.Motivation
The OpenAI Responses API is becoming a standard. Adding
/v1/responsesimproves interoperability and allows clients that depend on the new endpoint to work out of the box with vLLM’s router.Testing
/v1/chat/completions).FIX #690