Skip to content

Conversation

@luis5tb
Copy link
Contributor

@luis5tb luis5tb commented Nov 4, 2025

Description

Add v2 endpoint to interact with LlamaStack using Responses API (streaming) instead of Agent API
It is adding the support in a v2 endpoint to ensure backward compatibility

This is a follow up of the work done in #753

What is missing and to be done in follow up PRs:

  • Add support for shields once llamastack supports it for responses API
  • Add support for parsing and returning referenced document on the response

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added a v2 streaming query endpoint with richer real-time events, transcript management, and improved error handling.
    • Centralized tool-preparation for external tools (RAG and MCP) and improved per-server header merging with override support.
    • Introduced a streaming context object to simplify and stabilize streaming flows.
  • Tests

    • Added unit tests covering v2 streaming behavior, tool building, and header-merging scenarios.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 4, 2025

Walkthrough

Refactors streaming endpoints into a factory-based, dependency-injected flow, adds a Responses API v2 streaming endpoint, centralizes tool preparation into prepare_tools_for_responses_api(), and introduces ResponseGeneratorContext to carry streaming state and dependencies.

Changes

Cohort / File(s) Summary
Query v2 & tool preparation
src/app/endpoints/query_v2.py
Adds prepare_tools_for_responses_api() to assemble RAG and MCP toolgroups; updates get_mcp_tools() to merge per-server MCP headers with token-based Authorization; imports AppConfig.
Streaming endpoint base & Agent flow
src/app/endpoints/streaming_query.py
Introduces streaming_query_endpoint_handler_base() (injectable handler), create_agent_response_generator() (Agent streaming generator factory), and updates streaming_query_endpoint_handler() to delegate to base with factories; exposes ResponseGeneratorContext.
Responses API v2 streaming
src/app/endpoints/streaming_query_v2.py
New module providing streaming_query_endpoint_handler_v2(), v2 retrieve_response() (prepares tools, validates attachments, calls Llama Stack with stream=True), and create_responses_response_generator() (SSE generator aggregating tokens, tool calls, turn and end events).
Context model
src/models/context.py
Adds ResponseGeneratorContext dataclass (conversation_id, user_id, skip_userid_check, model_id, provider_id, llama_stack_model_id, query_request, started_at, client, metadata_map).
Router registration
src/app/routers.py
Imports and registers streaming_query_v2 router under /v2 prefix alongside other v2 endpoints.
Unit tests — query v2 headers
tests/unit/app/endpoints/test_query_v2.py
Adds test_get_mcp_tools_with_mcp_headers() to verify per-server header merging and Authorization precedence.
Unit tests — v2 streaming
tests/unit/app/endpoints/test_streaming_query_v2.py
New test module with fixtures and async tests covering retrieve_response tool assembly, no_tools behavior, successful SSE streaming flow, persistence calls, and APIConnectionError handling.
Router tests update
tests/unit/app/test_routers.py
Updates router count and asserts presence/prefix of the new /v2 streaming router.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant QueryV2 as query_v2.retrieve_response
    participant ToolPrep as prepare_tools_for_responses_api
    participant RAG as VectorStore
    participant MCP as MCP Servers
    participant LlamaAPI as Llama Stack API

    Client->>QueryV2: (query_request, token, mcp_headers)
    QueryV2->>ToolPrep: prepare_tools_for_responses_api(...)
    ToolPrep->>RAG: lookup vector_store IDs (RAG tools)
    ToolPrep->>MCP: get_mcp_tools(...) (merged headers)
    MCP-->>ToolPrep: MCP tool definitions
    ToolPrep-->>QueryV2: toolgroups or None
    QueryV2->>LlamaAPI: stream(tools=[...], stream=True)
    LlamaAPI-->>QueryV2: streaming chunks (text, tool_calls, completion)
    QueryV2-->>Client: SSE events (start, token, tool_call, turn_complete, end)
Loading
sequenceDiagram
    participant FastAPI as FastAPI Request
    participant Handler as streaming_query_endpoint_handler_v2
    participant Base as streaming_query_endpoint_handler_base
    participant Generator as create_responses_response_generator
    participant LlamaAPI as Llama Stack API
    participant Storage as Persistence / Cache

    FastAPI->>Handler: (request, query_request, auth, mcp_headers)
    Handler->>Base: delegate with v2 retrieve_response + generator factory
    Base->>Handler (retrieve_response): build tools, call LlamaAPI (stream=True)
    Handler (retrieve_response)->>LlamaAPI: returns AsyncIterator(stream)
    Base->>Generator: instantiate with ResponseGeneratorContext
    loop Streaming processing
      Generator->>Generator: parse chunks -> yield SSE events
      Generator->>Storage: store_transcript, update_cache, topic_summary
    end
    Base-->>FastAPI: StreamingResponse (SSE)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing careful review:
    • streaming_query.py and streaming_query_v2.py: concurrency and lifecycle of context, generator cancellation, error propagation, and correct usage of injected factories.
    • prepare_tools_for_responses_api() and get_mcp_tools(): header merge precedence, token handling, and None-vs-empty tools semantics.
    • Tests in tests/unit/app/endpoints/test_streaming_query_v2.py: ensure mocks faithfully emulate streaming behavior and persistence calls.

Possibly related PRs

Suggested reviewers

  • tisnik
  • manstis
  • umago

Poem

🐇 I hopped through handlers, seams undone,
I carried context, one by one,
V2 streams now hum and play,
Tools aligned and headers stay,
A rabbit's cheer — the streams run on!

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing a base implementation of streaming Responses API support, which is the primary feature across all modified files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc74af7 and a1b6f9c.

📒 Files selected for processing (3)
  • src/app/endpoints/streaming_query.py (6 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/models/context.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/models/context.py
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
🧬 Code graph analysis (2)
src/app/endpoints/streaming_query.py (6)
src/models/context.py (1)
  • ResponseGeneratorContext (12-48)
src/app/endpoints/streaming_query_v2.py (1)
  • response_generator (119-347)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • append_tool_calls_from_llama (96-117)
src/utils/endpoints.py (4)
  • get_system_prompt (126-190)
  • create_rag_chunks_dict (383-396)
  • create_referenced_documents_with_metadata (563-577)
  • store_conversation_into_cache (231-251)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • get_topic_summary (184-214)
  • persist_user_conversation_details (107-139)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/app/endpoints/streaming_query_v2.py (11)
src/app/database.py (1)
  • get_session (34-40)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • validate_attachments_metadata (801-830)
src/app/endpoints/query_v2.py (4)
  • extract_token_usage_from_responses_api (451-541)
  • get_topic_summary (235-274)
  • prepare_tools_for_responses_api (617-668)
  • retrieve_response (304-427)
src/app/endpoints/streaming_query.py (6)
  • format_stream_data (126-137)
  • stream_end_event (164-220)
  • stream_start_event (140-161)
  • streaming_query_endpoint_handler_base (851-983)
  • response_generator (716-846)
  • retrieve_response (1018-1139)
src/models/cache_entry.py (1)
  • CacheEntry (7-24)
src/models/database/conversations.py (1)
  • UserConversation (11-38)
src/utils/endpoints.py (3)
  • create_referenced_documents_with_metadata (563-577)
  • get_system_prompt (126-190)
  • store_conversation_into_cache (231-251)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@luis5tb luis5tb mentioned this pull request Nov 4, 2025
18 tasks
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b52ea28 and e95bc2b.

📒 Files selected for processing (8)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query.py (5 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • tests/unit/app/test_routers.py
  • src/app/routers.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • tests/unit/app/endpoints/test_query_v2.py
  • src/app/endpoints/query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/endpoints/test_query_v2.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/routers.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/routers.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/routers.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (7)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (37-52)
tests/unit/app/endpoints/test_streaming_query_v2.py (5)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/config.py (3)
  • config (140-146)
  • Action (329-375)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/streaming_query.py (1)
  • retrieve_response (1024-1145)
src/app/endpoints/streaming_query_v2.py (2)
  • retrieve_response (409-491)
  • streaming_query_endpoint_handler_v2 (379-406)
src/configuration.py (1)
  • mcp_servers (101-105)
src/app/endpoints/streaming_query.py (7)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/app/endpoints/streaming_query_v2.py (2)
  • response_generator (138-372)
  • retrieve_response (409-491)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • append_tool_calls_from_llama (96-117)
src/utils/endpoints.py (4)
  • get_system_prompt (126-190)
  • create_rag_chunks_dict (383-396)
  • create_referenced_documents_with_metadata (563-577)
  • store_conversation_into_cache (231-251)
src/app/endpoints/query.py (4)
  • is_transcripts_enabled (98-104)
  • get_topic_summary (184-214)
  • persist_user_conversation_details (107-139)
  • retrieve_response (639-798)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/app/database.py (1)
  • get_session (34-40)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
  • get_session (34-40)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • validate_attachments_metadata (801-830)
src/app/endpoints/query_v2.py (3)
  • extract_token_usage_from_responses_api (339-430)
  • get_mcp_tools (456-472)
  • get_rag_tools (433-453)
src/app/endpoints/streaming_query.py (6)
  • format_stream_data (125-136)
  • stream_end_event (163-219)
  • stream_start_event (139-160)
  • streaming_query_endpoint_handler_base (860-989)
  • response_generator (733-855)
  • retrieve_response (1024-1145)
src/models/cache_entry.py (1)
  • CacheEntry (7-24)
src/models/database/conversations.py (1)
  • UserConversation (11-38)
src/utils/endpoints.py (3)
  • create_referenced_documents_with_metadata (563-577)
  • get_system_prompt (126-190)
  • store_conversation_into_cache (231-251)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/app/endpoints/query_v2.py (10)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/authentication/__init__.py (1)
  • get_auth_dependency (14-52)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/configuration.py (2)
  • configuration (73-77)
  • mcp_servers (101-105)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/responses.py (4)
  • ForbiddenResponse (1120-1142)
  • QueryResponse (194-305)
  • ReferencedDocument (179-191)
  • UnauthorizedResponse (1094-1117)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (433-453)
  • get_mcp_tools (456-472)
  • retrieve_response (144-315)
  • query_endpoint_handler_v2 (119-141)
src/configuration.py (2)
  • mcp_servers (101-105)
  • llama_stack_configuration (87-91)
src/app/endpoints/query.py (2)
src/app/endpoints/streaming_query.py (1)
  • retrieve_response (1024-1145)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (144-315)
  • get_topic_summary (70-114)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (1)
src/app/endpoints/streaming_query.py (1)

699-857: Specify the concrete return type instead of Any.

Per our Python guidelines we need full type annotations, but create_agent_response_generator(...) -> Any hides the callable shape and weakens static checks. Please annotate it precisely (e.g. Callable[[AsyncIterator[AgentTurnResponseStreamChunk]], AsyncIterator[str]]) so downstream call sites stay type-safe.

-def create_agent_response_generator(  # pylint: disable=too-many-arguments,too-many-locals
+def create_agent_response_generator(  # pylint: disable=too-many-arguments,too-many-locals
     conversation_id: str,
     user_id: str,
     model_id: str,
     provider_id: str,
     query_request: QueryRequest,
     metadata_map: dict[str, dict[str, Any]],
     client: AsyncLlamaStackClient,
     llama_stack_model_id: str,
     started_at: str,
     _skip_userid_check: bool,
-) -> Any:
+) -> Callable[
+    [AsyncIterator[AgentTurnResponseStreamChunk]],
+    AsyncIterator[str],
+]:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/app/endpoints/streaming_query_v2.py (2)

175-189: Stop persisting the conversation inside the "response.created" handler.

Persisting here means the later call to persist_user_conversation_details() at lines 365-371 increments message_count a second time, and it also makes existing_conversation non-null in the topic-summary block (lines 329-336), so get_topic_summary() never runs for brand-new conversations.

Apply this diff to drop the premature persist:

             if event_type == "response.created":
                 try:
                     conv_id = getattr(chunk, "response").id
                 except Exception:  # pylint: disable=broad-except
                     conv_id = ""
                 yield stream_start_event(conv_id)
-                if conv_id:
-                    persist_user_conversation_details(
-                        user_id=user_id,
-                        conversation_id=conv_id,
-                        model=model_id,
-                        provider_id=provider_id,
-                        topic_summary=None,
-                    )
                 continue

454-462: Streamed requests must include attachment content.

Unlike the non-streaming path in query_v2.py (lines 197-206), this streaming retrieve skips the attachment text entirely, so any uploaded logs/configs never reach the LLM. That's a functional regression for users relying on attachments.

Apply this diff to mirror the non-streaming input assembly:

+    # Prepare input for Responses API
+    # Convert attachments to text and concatenate with query
+    input_text = query_request.query
+    if query_request.attachments:
+        for attachment in query_request.attachments:
+            input_text += (
+                f"\n\n[Attachment: {attachment.attachment_type}]\n"
+                f"{attachment.content}"
+            )
+
     response = await client.responses.create(
-        input=query_request.query,
+        input=input_text,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
         tools=(cast(Any, toolgroups)),
         stream=True,
         store=True,
     )
🧹 Nitpick comments (4)
src/app/endpoints/streaming_query_v2.py (2)

33-33: Use absolute import path for internal authentication module.

The coding guidelines specify that internal modules in src/**/*.py should use absolute imports (e.g., from auth import get_auth_dependency), but this uses a relative-style import.

As per coding guidelines

Apply this diff to align with the import convention:

-from authentication import get_auth_dependency
+from auth import get_auth_dependency

53-53: Use __name__ for the logger instead of a hardcoded string.

Coding guidelines require logger = logging.getLogger(__name__) for module logging, which provides better traceability.

As per coding guidelines

Apply this diff:

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/app/endpoints/query_v2.py (2)

17-17: Use absolute import path for internal authentication module.

The coding guidelines specify that internal modules in src/**/*.py should use absolute imports (e.g., from auth import get_auth_dependency).

As per coding guidelines

Apply this diff:

-from authentication import get_auth_dependency
+from auth import get_auth_dependency

38-38: Use __name__ for the logger instead of a hardcoded string.

Coding guidelines require logger = logging.getLogger(__name__) for better traceability.

As per coding guidelines

Apply this diff:

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e95bc2b and 9e02999.

📒 Files selected for processing (2)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
🧬 Code graph analysis (2)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
  • get_session (34-40)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • validate_attachments_metadata (801-830)
src/app/endpoints/query_v2.py (4)
  • extract_token_usage_from_responses_api (317-408)
  • get_topic_summary (70-114)
  • prepare_tools_for_responses_api (453-502)
  • retrieve_response (144-293)
src/app/endpoints/streaming_query.py (6)
  • format_stream_data (125-136)
  • stream_end_event (163-219)
  • stream_start_event (139-160)
  • streaming_query_endpoint_handler_base (860-989)
  • response_generator (733-855)
  • retrieve_response (1024-1145)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/utils/endpoints.py (3)
  • create_referenced_documents_with_metadata (563-577)
  • get_system_prompt (126-190)
  • store_conversation_into_cache (231-251)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/app/endpoints/query_v2.py (7)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/models/responses.py (2)
  • QueryResponse (194-305)
  • ReferencedDocument (179-191)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
src/app/endpoints/streaming_query_v2.py (1)
  • retrieve_response (408-468)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
🔇 Additional comments (6)
src/app/endpoints/streaming_query_v2.py (1)

376-405: LGTM!

The endpoint handler correctly delegates to the base handler with the v2-specific retrieve and generator functions. The structure and type annotations are appropriate.

src/app/endpoints/query_v2.py (5)

70-114: LGTM!

The topic summary generation correctly uses the Responses API with proper error handling and text extraction logic.


117-141: LGTM!

The endpoint handler correctly delegates to the base handler with appropriate v2-specific functions.


144-293: LGTM!

The retrieve_response function correctly assembles input with attachments, processes the response output, and extracts tool calls. The implementation is solid.


296-314: Placeholder is intentional per PR objectives.

This function currently returns an empty list with a TODO, which aligns with the PR description stating that "support for parsing and returning referenced documents in responses" will be addressed in follow-up PRs.


317-408: LGTM!

The token usage extraction is comprehensive with proper error handling for various response formats and graceful degradation when usage data is unavailable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/unit/app/endpoints/test_streaming_query_v2.py (1)

189-189: Use the shared MOCK_AUTH constant for test authentication.

The coding guidelines specify using MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") for auth tuples in tests rather than inline values.

As per coding guidelines.

Apply this pattern:

+# At module level
+MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")

 # In test_streaming_query_endpoint_handler_v2_success_yields_events:
-        auth=("user123", "", True, "token-abc"),  # skip_userid_check=True
+        auth=("user123", "", True, "token-abc"),  # Can keep custom values if testing specific scenarios

 # In test_streaming_query_endpoint_handler_v2_api_connection_error:
-        auth=("user123", "", False, "tok"),
+        auth=MOCK_AUTH,

Also applies to: 244-244

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e02999 and dd1cee0.

📒 Files selected for processing (3)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/streaming_query_v2.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • src/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_streaming_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_streaming_query_v2.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query_v2.py (10)
src/app/database.py (1)
  • get_session (34-40)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • validate_attachments_metadata (801-830)
src/app/endpoints/query_v2.py (4)
  • extract_token_usage_from_responses_api (317-408)
  • get_topic_summary (70-114)
  • prepare_tools_for_responses_api (453-511)
  • retrieve_response (144-293)
src/app/endpoints/streaming_query.py (6)
  • format_stream_data (125-136)
  • stream_end_event (163-219)
  • stream_start_event (139-160)
  • streaming_query_endpoint_handler_base (860-989)
  • response_generator (733-855)
  • retrieve_response (1024-1145)
src/models/cache_entry.py (1)
  • CacheEntry (7-24)
src/models/database/conversations.py (1)
  • UserConversation (11-38)
src/utils/endpoints.py (3)
  • create_referenced_documents_with_metadata (563-577)
  • get_system_prompt (126-190)
  • store_conversation_into_cache (231-251)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
tests/unit/app/endpoints/test_streaming_query_v2.py (4)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/config.py (3)
  • config (140-146)
  • Action (329-375)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/streaming_query_v2.py (2)
  • retrieve_response (400-470)
  • streaming_query_endpoint_handler_v2 (370-397)
src/configuration.py (1)
  • mcp_servers (101-105)
src/app/endpoints/query_v2.py (6)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
src/app/endpoints/streaming_query_v2.py (1)
  • retrieve_response (400-470)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (10)
src/app/endpoints/streaming_query_v2.py (3)

103-365: Well-structured streaming response generator implementation.

The create_responses_response_generator factory pattern cleanly separates concerns and correctly handles:

  • In-band conversation ID extraction from response.created events
  • Token and tool call streaming with proper SSE formatting
  • Single persistence call after topic summary logic (avoiding double-increment)
  • Attachment content inclusion in input assembly
  • Token usage extraction and metrics updates

The previous review concerns about premature persistence and missing attachment content have been properly addressed.


368-397: Clean delegation to base handler.

The wrapper pattern provides good separation between v2-specific Responses API logic and shared streaming infrastructure.


400-470: Proper tool preparation and attachment handling for streaming.

The retrieve_response function correctly:

  • Validates attachments before processing
  • Prepares RAG and MCP tools with header merging via prepare_tools_for_responses_api
  • Concatenates attachment content into the input text
  • Returns an empty conversation ID for in-band extraction
src/app/endpoints/query_v2.py (7)

70-114: Topic summary generation adapted correctly for Responses API.

The function properly uses client.responses.create() with store=False to avoid polluting conversation history, and robustly extracts text from various content structures.


117-141: Clean delegation to shared base handler.

The v2 query endpoint correctly injects Responses API-specific functions into the base handler pattern.


144-293: Solid non-streaming retrieve implementation.

The function correctly:

  • Validates attachments upfront
  • Prepares tools with MCP header merging
  • Assembles input text with attachment content
  • Parses response output and tool calls from OpenAI format
  • Extracts token usage and referenced documents

296-314: Placeholder appropriately documents missing functionality.

The TODO clearly indicates that referenced document parsing from Responses API response structures (file_search results, citations) is deferred to follow-up work, as mentioned in the PR objectives.


317-408: Robust token usage extraction with defensive handling.

The implementation correctly:

  • Handles both dict and object response.usage formats
  • Gracefully degrades when usage data is absent (expected when llama stack uses chat_completions internally)
  • Updates Prometheus metrics with proper error handling
  • Increments call counter even when token counts are unavailable

411-450: Clean tool definition helpers.

The get_rag_tools and get_mcp_tools functions provide clear, focused transformations from configuration to Responses API tool format.


453-511: Tool preparation correctly merges per-request MCP headers.

The previous review concern about dropping mcp_headers has been properly addressed. Lines 494-500 merge the caller's per-request headers into each MCP tool definition, ensuring short-lived tokens and dynamic auth reach the MCP servers.

@luis5tb luis5tb force-pushed the responses_v2_streaming branch from dd1cee0 to 68dc335 Compare November 4, 2025 18:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)

860-990: Fix docstring mood and argument count to unblock lint.

The new base handler’s docstring starts with “Base handler…”, violating pydocstyle D401, and the signature has six positional parameters, exceeding pylint’s five-argument limit. Please rephrase the docstring to an imperative sentence (e.g., “Handle streaming query requests…”) and collapse the parameters (e.g., wrap injected callbacks/mcp headers into a struct) so pylint R0917 passes.

🧹 Nitpick comments (3)
tests/unit/app/endpoints/test_query_v2.py (1)

385-434: Use the shared MOCK_AUTH constant.

Per the repo’s test guidelines, test auth tuples should reuse MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") instead of crafting custom values. Please switch the auth= argument to that constant (and update related expectations) so the suite follows the standard.

src/app/endpoints/streaming_query_v2.py (2)

53-53: Logger name should use __name__ for consistency.

The logger is initialized with a hardcoded string "app.endpoints.handlers" instead of __name__. Using __name__ ensures the logger reflects the actual module path and improves maintainability.

Apply this diff:

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)

103-366: Consider refactoring to reduce complexity.

The function has 10 positional parameters (limit: 5), and the inner response_generator has 17 branches (limit: 12) and 75 statements (limit: 50). While the logic is correct, this complexity makes the code harder to maintain and test.

Consider:

  • Grouping related parameters into a configuration object or dataclass
  • Extracting event-handling logic into separate helper functions (e.g., _handle_text_delta, _handle_tool_call, _handle_completion)
  • Moving post-processing logic (transcript, cache, persistence) into a separate function
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd1cee0 and 68dc335.

📒 Files selected for processing (8)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query.py (5 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/test_routers.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/routers.py
  • src/app/endpoints/query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/routers.py
  • src/app/endpoints/query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • tests/unit/app/endpoints/test_query_v2.py
  • src/app/routers.py
  • src/app/endpoints/query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/routers.py
  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (6)
src/app/endpoints/query_v2.py (6)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
src/app/endpoints/streaming_query_v2.py (1)
  • retrieve_response (400-470)
src/app/endpoints/streaming_query.py (8)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/app/endpoints/streaming_query_v2.py (2)
  • response_generator (137-363)
  • retrieve_response (400-470)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • append_tool_calls_from_llama (96-117)
src/utils/endpoints.py (4)
  • get_system_prompt (126-190)
  • create_rag_chunks_dict (383-396)
  • create_referenced_documents_with_metadata (563-577)
  • store_conversation_into_cache (231-251)
src/utils/token_counter.py (2)
  • extract_token_usage_from_turn (44-94)
  • TokenCounter (18-41)
src/app/endpoints/query.py (4)
  • is_transcripts_enabled (98-104)
  • get_topic_summary (184-214)
  • persist_user_conversation_details (107-139)
  • retrieve_response (639-798)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/app/endpoints/query_v2.py (2)
  • get_topic_summary (235-274)
  • retrieve_response (304-427)
src/app/endpoints/streaming_query_v2.py (16)
src/app/database.py (1)
  • get_session (34-40)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • validate_attachments_metadata (801-830)
src/app/endpoints/query_v2.py (4)
  • extract_token_usage_from_responses_api (451-541)
  • get_topic_summary (235-274)
  • prepare_tools_for_responses_api (617-668)
  • retrieve_response (304-427)
src/app/endpoints/streaming_query.py (6)
  • format_stream_data (125-136)
  • stream_end_event (163-219)
  • stream_start_event (139-160)
  • streaming_query_endpoint_handler_base (860-989)
  • response_generator (733-855)
  • retrieve_response (1024-1145)
src/authentication/__init__.py (1)
  • get_auth_dependency (14-52)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/models/cache_entry.py (1)
  • CacheEntry (7-24)
src/models/config.py (2)
  • config (140-146)
  • Action (329-375)
src/models/database/conversations.py (1)
  • UserConversation (11-38)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/responses.py (2)
  • ForbiddenResponse (1120-1142)
  • UnauthorizedResponse (1094-1117)
src/utils/endpoints.py (3)
  • create_referenced_documents_with_metadata (563-577)
  • get_system_prompt (126-190)
  • store_conversation_into_cache (231-251)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (552-572)
  • get_mcp_tools (575-614)
  • retrieve_response (304-427)
  • query_endpoint_handler_v2 (279-301)
src/configuration.py (2)
  • mcp_servers (101-105)
  • llama_stack_configuration (87-91)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (37-52)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (304-427)
  • get_topic_summary (235-274)
🪛 GitHub Actions: Pydocstyle
src/app/endpoints/streaming_query.py

[error] 868-868: pydocstyle D401: First line should be in imperative mood; try rephrasing (found 'Base') in public function 'streaming_query_endpoint_handler_base'.

🪛 GitHub Actions: Pyright
src/app/endpoints/streaming_query_v2.py

[error] 456-456: pyright: No overloads for 'create' match the provided arguments (reportCallIssue) in streaming_query_v2.py:456:22. Step: 'uv run pyright src'.


[error] 460-460: pyright: Argument of type 'str | None' cannot be assigned to parameter 'previous_response_id' of type 'str | NotGiven' in function 'create' in streaming_query_v2.py:460:30. Step: 'uv run pyright src'.

🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py

[error] 699-699: pylint: R0917 Too many positional arguments (9/5) (too-many-positional-arguments)


[error] 860-860: pylint: R0917 Too many positional arguments (6/5) (too-many-positional-arguments)

src/app/endpoints/streaming_query_v2.py

[error] 103-103: pylint: R0917 Too many positional arguments (9/5) (too-many-positional-arguments)


[error] 137-137: pylint: R0912 Too many branches (17/12) (too-many-branches)


[error] 137-137: pylint: R0915 Too many statements (75/50) (too-many-statements)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (1)
src/app/endpoints/streaming_query_v2.py (1)

368-397: LGTM!

The endpoint handler correctly follows the FastAPI pattern with proper authorization, dependency injection, and clean delegation to the base handler. The wrapper pattern enables code reuse while keeping v2-specific concerns isolated.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/app/endpoints/streaming_query.py (1)

700-702: Specify the return type more precisely.

The return type Any is overly broad. Consider using a more specific type annotation for better type safety:

from collections.abc import Callable
from typing import AsyncIterator

def create_agent_response_generator(
    context: ResponseGeneratorContext,
) -> Callable[[AsyncIterator[AgentTurnResponseStreamChunk]], AsyncIterator[str]]:

This clearly documents that the factory returns a callable taking an async iterator of chunks and yielding SSE strings.

src/app/endpoints/streaming_query_v2.py (1)

104-118: Specify the return type more precisely.

Like the v1 factory, the return type Any should be more specific for better type safety:

def create_responses_response_generator(
    context: ResponseGeneratorContext,
) -> Callable[[AsyncIterator[OpenAIResponseObjectStream]], AsyncIterator[str]]:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68dc335 and 5c289ed.

📒 Files selected for processing (3)
  • src/app/endpoints/streaming_query.py (6 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/models/context.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/models/context.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/streaming_query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/models/context.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/streaming_query.py
src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

  • src/models/context.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/context.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query_v2.py (12)
src/app/database.py (1)
  • get_session (34-40)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • validate_attachments_metadata (801-830)
src/app/endpoints/query_v2.py (4)
  • extract_token_usage_from_responses_api (451-541)
  • get_topic_summary (235-274)
  • prepare_tools_for_responses_api (617-668)
  • retrieve_response (304-427)
src/app/endpoints/streaming_query.py (6)
  • format_stream_data (126-137)
  • stream_end_event (164-220)
  • stream_start_event (140-161)
  • streaming_query_endpoint_handler_base (851-983)
  • response_generator (716-846)
  • retrieve_response (1018-1139)
src/models/cache_entry.py (1)
  • CacheEntry (7-24)
src/models/database/conversations.py (1)
  • UserConversation (11-38)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/utils/endpoints.py (3)
  • create_referenced_documents_with_metadata (563-577)
  • get_system_prompt (126-190)
  • store_conversation_into_cache (231-251)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
src/app/endpoints/streaming_query.py (6)
src/models/context.py (1)
  • ResponseGeneratorContext (12-48)
src/app/endpoints/streaming_query_v2.py (1)
  • response_generator (120-348)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • append_tool_calls_from_llama (96-117)
src/utils/endpoints.py (4)
  • get_system_prompt (126-190)
  • create_rag_chunks_dict (383-396)
  • create_referenced_documents_with_metadata (563-577)
  • store_conversation_into_cache (231-251)
src/utils/token_counter.py (2)
  • extract_token_usage_from_turn (44-94)
  • TokenCounter (18-41)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • get_topic_summary (184-214)
  • persist_user_conversation_details (107-139)
src/models/context.py (1)
src/models/requests.py (1)
  • QueryRequest (73-225)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: shellcheck
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (7)
src/models/context.py (1)

1-48: LGTM! Well-structured context dataclass.

The ResponseGeneratorContext dataclass effectively reduces function parameter counts from 9+ to 1, improving maintainability. The implementation follows all coding guidelines with proper module/class docstrings, complete type annotations, and logical field grouping.

src/app/endpoints/streaming_query.py (2)

851-983: LGTM! Clean dependency injection pattern.

The base handler effectively extracts common streaming logic while accepting API-specific functions via retrieve_response_func and create_response_generator_func. The ResponseGeneratorContext is properly constructed and passed to the generator factory, enabling v1/v2 code reuse.


986-1015: LGTM! Proper v1 endpoint wrapper.

The wrapper cleanly delegates to the base handler while providing Agent API-specific implementations (retrieve_response and create_agent_response_generator). This maintains backward compatibility for the v1 /streaming_query endpoint.

src/app/endpoints/streaming_query_v2.py (4)

1-56: LGTM! Proper module setup.

Module follows coding guidelines with descriptive docstring, proper logger initialization, and absolute imports.


273-349: LGTM! Post-streaming logic correctly implemented.

The finalization sequence properly:

  • Extracts token usage from the Responses API response object
  • Yields end event with metadata
  • Stores transcripts conditionally
  • Generates topic summaries only for new conversations
  • Persists cache and conversation details

The TODO comments at lines 296-297 about RAG chunks align with the PR description noting that referenced documents support will come in a follow-up.


455-457: LGTM! Proper handling of delayed conversation ID.

Returning an empty string for conversation_id is correct for the Responses API streaming flow, where the ID arrives in the first response.created chunk (lines 158-163). The comment clearly documents this behavior.


441-452: ****

The concern about passing None for tools is unfounded. Test assertions in the codebase (explicitly verify that tools=None is passed to the API and is accepted). The current code—unconditionally including "tools": toolgroups—is correct and tested. No conditional logic is needed.

Likely an incorrect or invalid review comment.

@luis5tb luis5tb force-pushed the responses_v2_streaming branch from 5c289ed to dc74af7 Compare November 5, 2025 11:35
@luis5tb luis5tb changed the title [WIP] Base implementation of streaming Responses API Base implementation of streaming Responses API Nov 5, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)

617-668: Consider adding error handling for vector store retrieval.

The function is well-structured and follows coding guidelines. However, the call to client.vector_stores.list() at line 647 could potentially fail. Consider wrapping it in a try-except block to handle transient errors gracefully, falling back to no RAG tools if the vector store service is unavailable.

Example:

     toolgroups = []
     # Get vector stores for RAG tools
-    vector_store_ids = [
-        vector_store.id for vector_store in (await client.vector_stores.list()).data
-    ]
+    try:
+        vector_store_ids = [
+            vector_store.id for vector_store in (await client.vector_stores.list()).data
+        ]
+    except Exception as e:
+        logger.warning("Failed to retrieve vector stores: %s", e)
+        vector_store_ids = []
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c289ed and dc74af7.

📒 Files selected for processing (8)
  • src/app/endpoints/query_v2.py (3 hunks)
  • src/app/endpoints/streaming_query.py (6 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (2 hunks)
  • src/models/context.py (1 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/app/routers.py
🧰 Additional context used
📓 Path-based instructions (9)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/models/context.py
  • src/app/endpoints/streaming_query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/streaming_query_v2.py
  • tests/unit/app/endpoints/test_query_v2.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/models/context.py
  • src/app/endpoints/streaming_query.py
  • tests/unit/app/test_routers.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_query_v2.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/test_routers.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_query_v2.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/test_routers.py
src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

  • src/models/context.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/context.py
🧬 Code graph analysis (6)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
  • get_session (34-40)
src/app/endpoints/query.py (3)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • validate_attachments_metadata (801-830)
src/app/endpoints/query_v2.py (4)
  • extract_token_usage_from_responses_api (451-541)
  • get_topic_summary (235-274)
  • prepare_tools_for_responses_api (617-668)
  • retrieve_response (304-427)
src/app/endpoints/streaming_query.py (6)
  • format_stream_data (126-137)
  • stream_end_event (164-220)
  • stream_start_event (140-161)
  • streaming_query_endpoint_handler_base (851-983)
  • response_generator (716-846)
  • retrieve_response (1018-1139)
src/models/cache_entry.py (1)
  • CacheEntry (7-24)
src/models/context.py (1)
  • ResponseGeneratorContext (12-48)
src/models/database/conversations.py (1)
  • UserConversation (11-38)
src/utils/endpoints.py (2)
  • create_referenced_documents_with_metadata (563-577)
  • get_system_prompt (126-190)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
tests/unit/app/endpoints/test_query_v2.py (2)
src/models/config.py (1)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (1)
  • get_mcp_tools (575-614)
tests/unit/app/endpoints/test_streaming_query_v2.py (3)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/config.py (3)
  • config (140-146)
  • Action (329-375)
  • ModelContextProtocolServer (169-174)
src/configuration.py (1)
  • mcp_servers (101-105)
src/app/endpoints/query_v2.py (2)
src/configuration.py (3)
  • configuration (73-77)
  • AppConfig (39-181)
  • mcp_servers (101-105)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/context.py (1)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/app/endpoints/streaming_query.py (2)
src/models/context.py (1)
  • ResponseGeneratorContext (12-48)
src/app/endpoints/streaming_query_v2.py (2)
  • response_generator (120-348)
  • retrieve_response (385-457)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (18)
tests/unit/app/test_routers.py (1)

23-23: LGTM! Router registration updates are correct.

The test updates correctly account for the new v2 streaming router: import added, router count incremented to 17, existence assertion added, and prefix verification included.

Also applies to: 69-69, 79-79, 95-95, 105-105

src/app/endpoints/query_v2.py (3)

21-21: LGTM! Import required for new function.

The AppConfig import is needed for the prepare_tools_for_responses_api function signature.


353-355: LGTM! Good refactoring for code reuse.

Extracting tool preparation into prepare_tools_for_responses_api centralizes the logic for both query and streaming v2 endpoints, and correctly passes mcp_headers to enable per-server authentication.


601-614: LGTM! Header merging logic is correct and flexible.

The header construction properly merges token-based authentication with per-server custom headers, allowing server-specific authorization overrides when needed. This addresses the previous review concern about dropped MCP headers.

tests/unit/app/endpoints/test_query_v2.py (1)

60-96: LGTM! Comprehensive test coverage for header merging.

The test thoroughly validates the three key scenarios: per-server headers without token, merged token + per-server headers, and Authorization header override. This ensures the header merging logic in get_mcp_tools works correctly.

src/models/context.py (1)

1-48: LGTM! Well-designed context object.

The ResponseGeneratorContext dataclass is well-structured with complete docstrings and type hints following coding guidelines. It effectively addresses the previous concern about too many positional arguments by bundling related parameters logically. The grouping of fields (conversation/user context, model/provider info, request/timing, dependencies/state) makes the code more maintainable.

tests/unit/app/endpoints/test_streaming_query_v2.py (4)

20-26: LGTM! Well-structured test fixture.

The dummy_request fixture properly sets up a mock FastAPI Request with authorized actions, which is necessary for RBAC checks in the endpoint handler.


29-56: LGTM! Tests validate core tool preparation logic.

Both tests properly verify that retrieve_response builds the correct tools for the Responses API: test_retrieve_response_builds_rag_and_mcp_tools confirms RAG and MCP tools are assembled, and test_retrieve_response_no_tools_passes_none ensures the no_tools flag is respected.

Also applies to: 59-80


83-223: LGTM! Comprehensive integration test for streaming flow.

This test validates the complete streaming lifecycle including SSE event generation, conversation persistence, and metric updates. The fake stream covers all major event types (response.created, output_text.delta, function_call, etc.), and assertions verify both event content and ordering.


226-250: LGTM! Error handling test is appropriate.

The test correctly verifies that API connection errors are caught, logged, return HTTP 500, and increment the failure metric.

src/app/endpoints/streaming_query.py (4)

8-8: LGTM! Necessary imports for refactoring.

The Callable import supports the factory function parameters in the base handler, and ResponseGeneratorContext enables the parameter reduction pattern.

Also applies to: 50-50


700-848: LGTM! Factory function properly encapsulates Agent API streaming logic.

The create_agent_response_generator function effectively bundles the streaming response generation logic, using the ResponseGeneratorContext to access all necessary parameters. The inner generator handles SSE formatting, transcript storage, topic summarization, caching, and persistence correctly.


851-983: LGTM! Base handler provides good separation of concerns.

The streaming_query_endpoint_handler_base extracts all common streaming logic (RBAC, conversation ownership validation, model selection, error handling) while accepting injectable functions for API-specific behavior. This enables code reuse between v1 (Agent API) and v2 (Responses API) endpoints.


986-1015: LGTM! Wrapper delegates to base with Agent API specifics.

The streaming_query_endpoint_handler function is a thin wrapper that supplies Agent API-specific implementations (retrieve_response and create_agent_response_generator) to the base handler, maintaining backward compatibility for the v1 endpoint.

src/app/endpoints/streaming_query_v2.py (4)

1-56: LGTM! Module setup follows coding guidelines.

The module has a descriptive docstring, proper imports, and appropriate configuration. The router is correctly tagged with "streaming_query_v2" and response schemas are well-defined.


104-350: LGTM! Response generator correctly handles Responses API events.

The create_responses_response_generator factory properly processes all Responses API event types:

  • response.created → start event with conversation ID
  • response.output_text.delta → token events
  • response.function_call_arguments.delta → tool_call events
  • response.completed → turn_complete event
  • Post-streaming: token usage extraction, transcript storage, topic summary, caching, and persistence

The event handling logic is comprehensive and correctly accumulates state across chunks.


353-382: LGTM! Endpoint wrapper follows established pattern.

The streaming_query_endpoint_handler_v2 function correctly delegates to the base handler with v2-specific implementations (retrieve_response and create_responses_response_generator), maintaining consistency with the v1 endpoint structure.


385-457: LGTM! Retrieve function properly prepares Responses API call.

The retrieve_response function correctly:

  • Validates attachments
  • Prepares tools via prepare_tools_for_responses_api (with mcp_headers)
  • Concatenates attachment content into input text
  • Conditionally builds create parameters to avoid type errors
  • Initiates streaming response with stream=True

The conversation ID is returned as empty string since it arrives in the first chunk, which is handled correctly by the generator.

@luis5tb luis5tb force-pushed the responses_v2_streaming branch from dc74af7 to a1b6f9c Compare November 5, 2025 11:42
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit, but would be nice to have @eranco74 ack too. TYVM

@luis5tb luis5tb mentioned this pull request Nov 5, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants