Skip to content

ai-proxy: streaming not working in 3.9.0 #14425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
muscionig opened this issue Apr 18, 2025 · 0 comments
Open
1 task done

ai-proxy: streaming not working in 3.9.0 #14425

muscionig opened this issue Apr 18, 2025 · 0 comments

Comments

@muscionig
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

3.8.0, 3.9.0

Current Behavior

When using kong with the ai-proxy, streaming responses are "buffered" in kong 3.9.0 but are working with the same configuration in 3.8.0. See below for a fully reproducible example.

Expected Behavior

When requesting streaming, kong should correctly serve each SSE as soon as those are received.

Steps To Reproduce

You can use this docker-compose.yaml, you may need to comment/uncomment the kong migrations bootstrap/up piece:

name: local-kong
version: "3.4"

x-common-variables: &common-variables
  KONG_PG_PORT: 5432
  KONG_PG_USER: kong
  KONG_PG_PASSWORD: kong
  KONG_PG_DATABASE: kong
  POSTGRES_USER: kong
  POSTGRES_PASSWORD: kong
  POSTGRES_DB: kong
  KONG_HTTP_PROXY_PORT: 8000
  KONG_HTTP_ADMIN_PORT: 8001


services:
  kong-db:
    networks:
      - local-kong-net
    image: postgres:17.2-alpine
    ports:
      - 5432:5432
    environment:
      <<: *common-variables

  kong:
    platform: "linux/amd64"
    networks:
      - local-kong-net
    image: kong:3.8.0
    ports:
      - 8000:8000
      - 8443:8443
      - 8001:8001
      - 8002:8002
      - 8444:8444
    environment:
      <<: *common-variables
      KONG_DATABASE:    postgres
      KONG_PG_HOST:     kong-db
      KONG_ADMIN_LISTEN: 0.0.0.0:8001
      KONG_ADMIN_GUI_URL: http://localhost:8002
      KONG_PROXY_LISTEN: 0.0.0.0:8000
      KONG_PROXY_ACCESS_LOG: /dev/stdout
      KONG_ADMIN_ACCESS_LOG: /dev/stdout
      KONG_PROXY_ERROR_LOG:  /dev/stderr
      KONG_ADMIN_ERROR_LOG:  /dev/stderr
      KONG_PLUGINS: bundled
      KONG_LOG_LEVEL: info
    depends_on:
      - kong-db
    # command: ["kong", "migrations", "bootstrap"]

networks:
  local-kong-net:
    driver: bridge

Steps:

  1. run docker compose up
  2. run the following configuration:
    curl -X POST http://localhost:8001/services \
    --data "name=openai" \
    --data "url=https://api.openai.com"
    
    curl -X POST http://localhost:8001/routes \
    --data "service.id=$(curl -s http://localhost:8001/services/openai | jq -r '.id')" \
    --data name=openai \
    --data "paths[]=/openai"
    
    curl -X POST http://localhost:8001/services/openai/plugins \
     --header 'Content-Type: application/json' \
     --header 'accept: application/json' \
     --data '{
    "name": "ai-proxy",
    "instance_name": "openai",
    "config": {
      "route_type": "llm/v1/chat",
      "model": {
        "provider": "openai"
      },
      "auth": {
            "header_value": "Bearer <OPENAI_KEY>",
            "header_name": "Authorization"
        }
    }
     }'
  3. very streaming works:
     curl -X POST http://localhost:8000/openai/v1/chat/completions \
      -H 'Content-Type: application/json' \
      --data-raw '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is deep learning?"}], "temperature": 0.7, "stream": true, "max_tokens": 100}'
    
  4. change the kong image to 3.9.0, run the migrations and then docker compose up
  5. re-run the same of point 3, kong will return all SSE at once, and also log (note that 3.8.0 does not have this warning):
    kong-1     | 2025/04/18 17:52:53 [warn] 1405#0: *5218 an upstream response is buffered to a temporary file /usr/local/kong/proxy_temp/4/00/0000000004 while reading upstream, client: 172.19.0.1
    
  6. disable the ai-proxy plugin and add the open key as bearer: -H "Authorization: Bearer $OPEN_AI_KEY"
  7. re-run the same of point 3, streaming will work.

Anything else?

While the steps are for openai, I verified the same behavior with multiple providers (including bedrock and self-hosted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant