Skip to content

EPP cannot serve /chat/completions API #790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
delavet opened this issue May 7, 2025 · 2 comments · Fixed by #798
Closed

EPP cannot serve /chat/completions API #790

delavet opened this issue May 7, 2025 · 2 comments · Fixed by #798
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@delavet
Copy link
Contributor

delavet commented May 7, 2025

What happened:
Previously, with EPP we can always server /chat/completions API as well as the /completions API. This behavior seems to be broken in the latest version of EPP. When we test our inference service with /chat/completions API using the latest EPP, we will get the following error:

curl -X POST xxxx:8081/v1/chat/completions -H 'Content-Type: application/json' -d '{
    "model": "qwen",
    "temperature": 0,
    "messages": [
      {
        "role": "user",
        "content": "who are you"
      }
    ]
}' -v
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying xxxx:8081...
* Connected to xxxx (xxxx) port 8081
> POST /v1/chat/completions HTTP/1.1
> Host: xxxx:8081
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 145
> 
* upload completely sent off: 145 bytes
< HTTP/1.1 400 Bad Request
< date: Wed, 07 May 2025 04:01:09 GMT
< connection: close
< content-length: 0
< 
* Closing connection

I believe that this 400 status is caused by EPP because I have found following errors in EPP logs:

2025-05-07T04:01:09Z	ERROR	handlers/server.go:202	Error handling body	{"x-request-id": "24162a7f-7c72-4608-806e-06c5f4a60688", "error": "inference gateway: BadRequest - prompt not found in request"}
sigs.k8s.io/gateway-api-inference-extension/pkg/epp/handlers.(*StreamingServer).Process
	sigs.k8s.io/gateway-api-inference-extension/pkg/epp/handlers/server.go:202
sigs.k8s.io/gateway-api-inference-extension/vendor/github.com/envoyproxy/go-control-plane/envoy/service/ext_proc/v3._ExternalProcessor_Process_Handler
	sigs.k8s.io/gateway-api-inference-extension/vendor/github.com/envoyproxy/go-control-plane/envoy/service/ext_proc/v3/external_processor_grpc.pb.go:106
sigs.k8s.io/gateway-api-inference-extension/vendor/google.golang.org/grpc.(*Server).processStreamingRPC
	sigs.k8s.io/gateway-api-inference-extension/vendor/google.golang.org/grpc/server.go:1695
sigs.k8s.io/gateway-api-inference-extension/vendor/google.golang.org/grpc.(*Server).handleStream
	sigs.k8s.io/gateway-api-inference-extension/vendor/google.golang.org/grpc/server.go:1819
sigs.k8s.io/gateway-api-inference-extension/vendor/google.golang.org/grpc.(*Server).serveStreams.func2.1
	sigs.k8s.io/gateway-api-inference-extension/vendor/google.golang.org/grpc/server.go:1035

What you expected to happen:
I think that /chat/completions could be served well just as /completions.
How to reproduce it (as minimally and precisely as possible):
Just use the latest EPP and test with /chat/completions as given.
Anything else we need to know?:
After searching, I have found that this line of code causes the error:

prompt, ok := requestBodyMap["prompt"].(string)

I think this could be optimized to support /chat/completions API easily by adding a util function which extract prompt from request body. For /chat/completions API, the prompts are embedded in messages field, we just need to handle this situation.

If this issue could be confirmed by maintainers, I can assign this to myself and provide a fix for it.
Environment:

  • Kubernetes version (use kubectl version):
  • Inference extension version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:
@delavet delavet added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 7, 2025
@nirrozenbaum
Copy link
Contributor

@delavet this is definitely a bug, good catch!
seems like it was introduced last week in PR #757 that added this exact line you pointed to.
please feel free to assign yourself and provide a fix.

Thanks!

@delavet
Copy link
Contributor Author

delavet commented May 7, 2025

Thanks!
/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants