-
Notifications
You must be signed in to change notification settings - Fork 49
Base implementation of streaming Responses API #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughRefactors streaming endpoints into a factory-based, dependency-injected flow, adds a Responses API v2 streaming endpoint, centralizes tool preparation into Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant QueryV2 as query_v2.retrieve_response
participant ToolPrep as prepare_tools_for_responses_api
participant RAG as VectorStore
participant MCP as MCP Servers
participant LlamaAPI as Llama Stack API
Client->>QueryV2: (query_request, token, mcp_headers)
QueryV2->>ToolPrep: prepare_tools_for_responses_api(...)
ToolPrep->>RAG: lookup vector_store IDs (RAG tools)
ToolPrep->>MCP: get_mcp_tools(...) (merged headers)
MCP-->>ToolPrep: MCP tool definitions
ToolPrep-->>QueryV2: toolgroups or None
QueryV2->>LlamaAPI: stream(tools=[...], stream=True)
LlamaAPI-->>QueryV2: streaming chunks (text, tool_calls, completion)
QueryV2-->>Client: SSE events (start, token, tool_call, turn_complete, end)
sequenceDiagram
participant FastAPI as FastAPI Request
participant Handler as streaming_query_endpoint_handler_v2
participant Base as streaming_query_endpoint_handler_base
participant Generator as create_responses_response_generator
participant LlamaAPI as Llama Stack API
participant Storage as Persistence / Cache
FastAPI->>Handler: (request, query_request, auth, mcp_headers)
Handler->>Base: delegate with v2 retrieve_response + generator factory
Base->>Handler (retrieve_response): build tools, call LlamaAPI (stream=True)
Handler (retrieve_response)->>LlamaAPI: returns AsyncIterator(stream)
Base->>Generator: instantiate with ResponseGeneratorContext
loop Streaming processing
Generator->>Generator: parse chunks -> yield SSE events
Generator->>Storage: store_transcript, update_cache, topic_summary
end
Base-->>FastAPI: StreamingResponse (SSE)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🧰 Additional context used📓 Path-based instructions (5)src/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/app/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/{app/**/*.py,client.py}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/app/endpoints/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧬 Code graph analysis (2)src/app/endpoints/streaming_query.py (6)
src/app/endpoints/streaming_query_v2.py (11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)src/app/endpoints/streaming_query.py(5 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/app/routers.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)tests/unit/app/test_routers.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/app/test_routers.pysrc/app/routers.pytests/unit/app/endpoints/test_streaming_query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pytests/unit/app/endpoints/test_query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_query_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/routers.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/routers.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/routers.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (7)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(37-52)
tests/unit/app/endpoints/test_streaming_query_v2.py (5)
src/models/requests.py (1)
QueryRequest(73-225)src/models/config.py (3)
config(140-146)Action(329-375)ModelContextProtocolServer(169-174)src/app/endpoints/streaming_query.py (1)
retrieve_response(1024-1145)src/app/endpoints/streaming_query_v2.py (2)
retrieve_response(409-491)streaming_query_endpoint_handler_v2(379-406)src/configuration.py (1)
mcp_servers(101-105)
src/app/endpoints/streaming_query.py (7)
src/models/requests.py (1)
QueryRequest(73-225)src/app/endpoints/streaming_query_v2.py (2)
response_generator(138-372)retrieve_response(409-491)src/utils/types.py (2)
TurnSummary(89-163)append_tool_calls_from_llama(96-117)src/utils/endpoints.py (4)
get_system_prompt(126-190)create_rag_chunks_dict(383-396)create_referenced_documents_with_metadata(563-577)store_conversation_into_cache(231-251)src/app/endpoints/query.py (4)
is_transcripts_enabled(98-104)get_topic_summary(184-214)persist_user_conversation_details(107-139)retrieve_response(639-798)src/utils/transcripts.py (1)
store_transcript(40-99)src/app/database.py (1)
get_session(34-40)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (3)
extract_token_usage_from_responses_api(339-430)get_mcp_tools(456-472)get_rag_tools(433-453)src/app/endpoints/streaming_query.py (6)
format_stream_data(125-136)stream_end_event(163-219)stream_start_event(139-160)streaming_query_endpoint_handler_base(860-989)response_generator(733-855)retrieve_response(1024-1145)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/database/conversations.py (1)
UserConversation(11-38)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)
src/app/endpoints/query_v2.py (10)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/authentication/__init__.py (1)
get_auth_dependency(14-52)src/authorization/middleware.py (1)
authorize(111-122)src/configuration.py (2)
configuration(73-77)mcp_servers(101-105)src/models/requests.py (1)
QueryRequest(73-225)src/models/responses.py (4)
ForbiddenResponse(1120-1142)QueryResponse(194-305)ReferencedDocument(179-191)UnauthorizedResponse(1094-1117)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(433-453)get_mcp_tools(456-472)retrieve_response(144-315)query_endpoint_handler_v2(119-141)src/configuration.py (2)
mcp_servers(101-105)llama_stack_configuration(87-91)
src/app/endpoints/query.py (2)
src/app/endpoints/streaming_query.py (1)
retrieve_response(1024-1145)src/app/endpoints/query_v2.py (2)
retrieve_response(144-315)get_topic_summary(70-114)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (1)
src/app/endpoints/streaming_query.py (1)
699-857: Specify the concrete return type instead ofAny.Per our Python guidelines we need full type annotations, but
create_agent_response_generator(...) -> Anyhides the callable shape and weakens static checks. Please annotate it precisely (e.g.Callable[[AsyncIterator[AgentTurnResponseStreamChunk]], AsyncIterator[str]]) so downstream call sites stay type-safe.-def create_agent_response_generator( # pylint: disable=too-many-arguments,too-many-locals +def create_agent_response_generator( # pylint: disable=too-many-arguments,too-many-locals conversation_id: str, user_id: str, model_id: str, provider_id: str, query_request: QueryRequest, metadata_map: dict[str, dict[str, Any]], client: AsyncLlamaStackClient, llama_stack_model_id: str, started_at: str, _skip_userid_check: bool, -) -> Any: +) -> Callable[ + [AsyncIterator[AgentTurnResponseStreamChunk]], + AsyncIterator[str], +]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/app/endpoints/streaming_query_v2.py (2)
175-189: Stop persisting the conversation inside the "response.created" handler.Persisting here means the later call to
persist_user_conversation_details()at lines 365-371 incrementsmessage_counta second time, and it also makesexisting_conversationnon-null in the topic-summary block (lines 329-336), soget_topic_summary()never runs for brand-new conversations.Apply this diff to drop the premature persist:
if event_type == "response.created": try: conv_id = getattr(chunk, "response").id except Exception: # pylint: disable=broad-except conv_id = "" yield stream_start_event(conv_id) - if conv_id: - persist_user_conversation_details( - user_id=user_id, - conversation_id=conv_id, - model=model_id, - provider_id=provider_id, - topic_summary=None, - ) continue
454-462: Streamed requests must include attachment content.Unlike the non-streaming path in query_v2.py (lines 197-206), this streaming retrieve skips the attachment text entirely, so any uploaded logs/configs never reach the LLM. That's a functional regression for users relying on attachments.
Apply this diff to mirror the non-streaming input assembly:
+ # Prepare input for Responses API + # Convert attachments to text and concatenate with query + input_text = query_request.query + if query_request.attachments: + for attachment in query_request.attachments: + input_text += ( + f"\n\n[Attachment: {attachment.attachment_type}]\n" + f"{attachment.content}" + ) + response = await client.responses.create( - input=query_request.query, + input=input_text, model=model_id, instructions=system_prompt, previous_response_id=query_request.conversation_id, tools=(cast(Any, toolgroups)), stream=True, store=True, )
🧹 Nitpick comments (4)
src/app/endpoints/streaming_query_v2.py (2)
33-33: Use absolute import path for internal authentication module.The coding guidelines specify that internal modules in
src/**/*.pyshould use absolute imports (e.g.,from auth import get_auth_dependency), but this uses a relative-style import.As per coding guidelines
Apply this diff to align with the import convention:
-from authentication import get_auth_dependency +from auth import get_auth_dependency
53-53: Use__name__for the logger instead of a hardcoded string.Coding guidelines require
logger = logging.getLogger(__name__)for module logging, which provides better traceability.As per coding guidelines
Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)src/app/endpoints/query_v2.py (2)
17-17: Use absolute import path for internal authentication module.The coding guidelines specify that internal modules in
src/**/*.pyshould use absolute imports (e.g.,from auth import get_auth_dependency).As per coding guidelines
Apply this diff:
-from authentication import get_auth_dependency +from auth import get_auth_dependency
38-38: Use__name__for the logger instead of a hardcoded string.Coding guidelines require
logger = logging.getLogger(__name__)for better traceability.As per coding guidelines
Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/app/endpoints/query_v2.py(1 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
🧬 Code graph analysis (2)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(317-408)get_topic_summary(70-114)prepare_tools_for_responses_api(453-502)retrieve_response(144-293)src/app/endpoints/streaming_query.py (6)
format_stream_data(125-136)stream_end_event(163-219)stream_start_event(139-160)streaming_query_endpoint_handler_base(860-989)response_generator(733-855)retrieve_response(1024-1145)src/models/requests.py (1)
QueryRequest(73-225)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)
src/app/endpoints/query_v2.py (7)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/models/responses.py (2)
QueryResponse(194-305)ReferencedDocument(179-191)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(408-468)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (6)
src/app/endpoints/streaming_query_v2.py (1)
376-405: LGTM!The endpoint handler correctly delegates to the base handler with the v2-specific retrieve and generator functions. The structure and type annotations are appropriate.
src/app/endpoints/query_v2.py (5)
70-114: LGTM!The topic summary generation correctly uses the Responses API with proper error handling and text extraction logic.
117-141: LGTM!The endpoint handler correctly delegates to the base handler with appropriate v2-specific functions.
144-293: LGTM!The retrieve_response function correctly assembles input with attachments, processes the response output, and extracts tool calls. The implementation is solid.
296-314: Placeholder is intentional per PR objectives.This function currently returns an empty list with a TODO, which aligns with the PR description stating that "support for parsing and returning referenced documents in responses" will be addressed in follow-up PRs.
317-408: LGTM!The token usage extraction is comprehensive with proper error handling for various response formats and graceful degradation when usage data is unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/unit/app/endpoints/test_streaming_query_v2.py (1)
189-189: Use the shared MOCK_AUTH constant for test authentication.The coding guidelines specify using
MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")for auth tuples in tests rather than inline values.As per coding guidelines.
Apply this pattern:
+# At module level +MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") # In test_streaming_query_endpoint_handler_v2_success_yields_events: - auth=("user123", "", True, "token-abc"), # skip_userid_check=True + auth=("user123", "", True, "token-abc"), # Can keep custom values if testing specific scenarios # In test_streaming_query_endpoint_handler_v2_api_connection_error: - auth=("user123", "", False, "tok"), + auth=MOCK_AUTH,Also applies to: 244-244
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query_v2.py(1 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_streaming_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_streaming_query_v2.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query_v2.py (10)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(317-408)get_topic_summary(70-114)prepare_tools_for_responses_api(453-511)retrieve_response(144-293)src/app/endpoints/streaming_query.py (6)
format_stream_data(125-136)stream_end_event(163-219)stream_start_event(139-160)streaming_query_endpoint_handler_base(860-989)response_generator(733-855)retrieve_response(1024-1145)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/database/conversations.py (1)
UserConversation(11-38)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
tests/unit/app/endpoints/test_streaming_query_v2.py (4)
src/models/requests.py (1)
QueryRequest(73-225)src/models/config.py (3)
config(140-146)Action(329-375)ModelContextProtocolServer(169-174)src/app/endpoints/streaming_query_v2.py (2)
retrieve_response(400-470)streaming_query_endpoint_handler_v2(370-397)src/configuration.py (1)
mcp_servers(101-105)
src/app/endpoints/query_v2.py (6)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(400-470)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (10)
src/app/endpoints/streaming_query_v2.py (3)
103-365: Well-structured streaming response generator implementation.The
create_responses_response_generatorfactory pattern cleanly separates concerns and correctly handles:
- In-band conversation ID extraction from
response.createdevents- Token and tool call streaming with proper SSE formatting
- Single persistence call after topic summary logic (avoiding double-increment)
- Attachment content inclusion in input assembly
- Token usage extraction and metrics updates
The previous review concerns about premature persistence and missing attachment content have been properly addressed.
368-397: Clean delegation to base handler.The wrapper pattern provides good separation between v2-specific Responses API logic and shared streaming infrastructure.
400-470: Proper tool preparation and attachment handling for streaming.The
retrieve_responsefunction correctly:
- Validates attachments before processing
- Prepares RAG and MCP tools with header merging via
prepare_tools_for_responses_api- Concatenates attachment content into the input text
- Returns an empty conversation ID for in-band extraction
src/app/endpoints/query_v2.py (7)
70-114: Topic summary generation adapted correctly for Responses API.The function properly uses
client.responses.create()withstore=Falseto avoid polluting conversation history, and robustly extracts text from various content structures.
117-141: Clean delegation to shared base handler.The v2 query endpoint correctly injects Responses API-specific functions into the base handler pattern.
144-293: Solid non-streaming retrieve implementation.The function correctly:
- Validates attachments upfront
- Prepares tools with MCP header merging
- Assembles input text with attachment content
- Parses response output and tool calls from OpenAI format
- Extracts token usage and referenced documents
296-314: Placeholder appropriately documents missing functionality.The TODO clearly indicates that referenced document parsing from Responses API response structures (file_search results, citations) is deferred to follow-up work, as mentioned in the PR objectives.
317-408: Robust token usage extraction with defensive handling.The implementation correctly:
- Handles both dict and object response.usage formats
- Gracefully degrades when usage data is absent (expected when llama stack uses chat_completions internally)
- Updates Prometheus metrics with proper error handling
- Increments call counter even when token counts are unavailable
411-450: Clean tool definition helpers.The
get_rag_toolsandget_mcp_toolsfunctions provide clear, focused transformations from configuration to Responses API tool format.
453-511: Tool preparation correctly merges per-request MCP headers.The previous review concern about dropping
mcp_headershas been properly addressed. Lines 494-500 merge the caller's per-request headers into each MCP tool definition, ensuring short-lived tokens and dynamic auth reach the MCP servers.
dd1cee0 to
68dc335
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)
860-990: Fix docstring mood and argument count to unblock lint.The new base handler’s docstring starts with “Base handler…”, violating pydocstyle D401, and the signature has six positional parameters, exceeding pylint’s five-argument limit. Please rephrase the docstring to an imperative sentence (e.g., “Handle streaming query requests…”) and collapse the parameters (e.g., wrap injected callbacks/mcp headers into a struct) so pylint R0917 passes.
🧹 Nitpick comments (3)
tests/unit/app/endpoints/test_query_v2.py (1)
385-434: Use the sharedMOCK_AUTHconstant.Per the repo’s test guidelines, test auth tuples should reuse
MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")instead of crafting custom values. Please switch theauth=argument to that constant (and update related expectations) so the suite follows the standard.src/app/endpoints/streaming_query_v2.py (2)
53-53: Logger name should use__name__for consistency.The logger is initialized with a hardcoded string
"app.endpoints.handlers"instead of__name__. Using__name__ensures the logger reflects the actual module path and improves maintainability.Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)
103-366: Consider refactoring to reduce complexity.The function has 10 positional parameters (limit: 5), and the inner
response_generatorhas 17 branches (limit: 12) and 75 statements (limit: 50). While the logic is correct, this complexity makes the code harder to maintain and test.Consider:
- Grouping related parameters into a configuration object or dataclass
- Extracting event-handling logic into separate helper functions (e.g.,
_handle_text_delta,_handle_tool_call,_handle_completion)- Moving post-processing logic (transcript, cache, persistence) into a separate function
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)src/app/endpoints/streaming_query.py(5 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/app/routers.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)tests/unit/app/test_routers.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/unit/app/endpoints/test_streaming_query_v2.py
- tests/unit/app/test_routers.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/routers.pysrc/app/endpoints/query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/routers.pysrc/app/endpoints/query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pytests/unit/app/endpoints/test_query_v2.pysrc/app/routers.pysrc/app/endpoints/query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/routers.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (6)
src/app/endpoints/query_v2.py (6)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(400-470)
src/app/endpoints/streaming_query.py (8)
src/models/requests.py (1)
QueryRequest(73-225)src/app/endpoints/streaming_query_v2.py (2)
response_generator(137-363)retrieve_response(400-470)src/utils/types.py (2)
TurnSummary(89-163)append_tool_calls_from_llama(96-117)src/utils/endpoints.py (4)
get_system_prompt(126-190)create_rag_chunks_dict(383-396)create_referenced_documents_with_metadata(563-577)store_conversation_into_cache(231-251)src/utils/token_counter.py (2)
extract_token_usage_from_turn(44-94)TokenCounter(18-41)src/app/endpoints/query.py (4)
is_transcripts_enabled(98-104)get_topic_summary(184-214)persist_user_conversation_details(107-139)retrieve_response(639-798)src/utils/transcripts.py (1)
store_transcript(40-99)src/app/endpoints/query_v2.py (2)
get_topic_summary(235-274)retrieve_response(304-427)
src/app/endpoints/streaming_query_v2.py (16)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(451-541)get_topic_summary(235-274)prepare_tools_for_responses_api(617-668)retrieve_response(304-427)src/app/endpoints/streaming_query.py (6)
format_stream_data(125-136)stream_end_event(163-219)stream_start_event(139-160)streaming_query_endpoint_handler_base(860-989)response_generator(733-855)retrieve_response(1024-1145)src/authentication/__init__.py (1)
get_auth_dependency(14-52)src/authorization/middleware.py (1)
authorize(111-122)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/config.py (2)
config(140-146)Action(329-375)src/models/database/conversations.py (1)
UserConversation(11-38)src/models/requests.py (1)
QueryRequest(73-225)src/models/responses.py (2)
ForbiddenResponse(1120-1142)UnauthorizedResponse(1094-1117)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(552-572)get_mcp_tools(575-614)retrieve_response(304-427)query_endpoint_handler_v2(279-301)src/configuration.py (2)
mcp_servers(101-105)llama_stack_configuration(87-91)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(37-52)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/app/endpoints/query_v2.py (2)
retrieve_response(304-427)get_topic_summary(235-274)
🪛 GitHub Actions: Pydocstyle
src/app/endpoints/streaming_query.py
[error] 868-868: pydocstyle D401: First line should be in imperative mood; try rephrasing (found 'Base') in public function 'streaming_query_endpoint_handler_base'.
🪛 GitHub Actions: Pyright
src/app/endpoints/streaming_query_v2.py
[error] 456-456: pyright: No overloads for 'create' match the provided arguments (reportCallIssue) in streaming_query_v2.py:456:22. Step: 'uv run pyright src'.
[error] 460-460: pyright: Argument of type 'str | None' cannot be assigned to parameter 'previous_response_id' of type 'str | NotGiven' in function 'create' in streaming_query_v2.py:460:30. Step: 'uv run pyright src'.
🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py
[error] 699-699: pylint: R0917 Too many positional arguments (9/5) (too-many-positional-arguments)
[error] 860-860: pylint: R0917 Too many positional arguments (6/5) (too-many-positional-arguments)
src/app/endpoints/streaming_query_v2.py
[error] 103-103: pylint: R0917 Too many positional arguments (9/5) (too-many-positional-arguments)
[error] 137-137: pylint: R0912 Too many branches (17/12) (too-many-branches)
[error] 137-137: pylint: R0915 Too many statements (75/50) (too-many-statements)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (1)
src/app/endpoints/streaming_query_v2.py (1)
368-397: LGTM!The endpoint handler correctly follows the FastAPI pattern with proper authorization, dependency injection, and clean delegation to the base handler. The wrapper pattern enables code reuse while keeping v2-specific concerns isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/app/endpoints/streaming_query.py (1)
700-702: Specify the return type more precisely.The return type
Anyis overly broad. Consider using a more specific type annotation for better type safety:from collections.abc import Callable from typing import AsyncIterator def create_agent_response_generator( context: ResponseGeneratorContext, ) -> Callable[[AsyncIterator[AgentTurnResponseStreamChunk]], AsyncIterator[str]]:This clearly documents that the factory returns a callable taking an async iterator of chunks and yielding SSE strings.
src/app/endpoints/streaming_query_v2.py (1)
104-118: Specify the return type more precisely.Like the v1 factory, the return type
Anyshould be more specific for better type safety:def create_responses_response_generator( context: ResponseGeneratorContext, ) -> Callable[[AsyncIterator[OpenAIResponseObjectStream]], AsyncIterator[str]]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/streaming_query.py(6 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/models/context.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.pysrc/models/context.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.pysrc/models/context.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/streaming_query.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/context.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/context.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query_v2.py (12)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(451-541)get_topic_summary(235-274)prepare_tools_for_responses_api(617-668)retrieve_response(304-427)src/app/endpoints/streaming_query.py (6)
format_stream_data(126-137)stream_end_event(164-220)stream_start_event(140-161)streaming_query_endpoint_handler_base(851-983)response_generator(716-846)retrieve_response(1018-1139)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/database/conversations.py (1)
UserConversation(11-38)src/models/requests.py (1)
QueryRequest(73-225)src/utils/endpoints.py (3)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)store_conversation_into_cache(231-251)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/transcripts.py (1)
store_transcript(40-99)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
src/app/endpoints/streaming_query.py (6)
src/models/context.py (1)
ResponseGeneratorContext(12-48)src/app/endpoints/streaming_query_v2.py (1)
response_generator(120-348)src/utils/types.py (2)
TurnSummary(89-163)append_tool_calls_from_llama(96-117)src/utils/endpoints.py (4)
get_system_prompt(126-190)create_rag_chunks_dict(383-396)create_referenced_documents_with_metadata(563-577)store_conversation_into_cache(231-251)src/utils/token_counter.py (2)
extract_token_usage_from_turn(44-94)TokenCounter(18-41)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)get_topic_summary(184-214)persist_user_conversation_details(107-139)
src/models/context.py (1)
src/models/requests.py (1)
QueryRequest(73-225)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: shellcheck
- GitHub Check: build-pr
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (7)
src/models/context.py (1)
1-48: LGTM! Well-structured context dataclass.The
ResponseGeneratorContextdataclass effectively reduces function parameter counts from 9+ to 1, improving maintainability. The implementation follows all coding guidelines with proper module/class docstrings, complete type annotations, and logical field grouping.src/app/endpoints/streaming_query.py (2)
851-983: LGTM! Clean dependency injection pattern.The base handler effectively extracts common streaming logic while accepting API-specific functions via
retrieve_response_funcandcreate_response_generator_func. TheResponseGeneratorContextis properly constructed and passed to the generator factory, enabling v1/v2 code reuse.
986-1015: LGTM! Proper v1 endpoint wrapper.The wrapper cleanly delegates to the base handler while providing Agent API-specific implementations (
retrieve_responseandcreate_agent_response_generator). This maintains backward compatibility for the v1/streaming_queryendpoint.src/app/endpoints/streaming_query_v2.py (4)
1-56: LGTM! Proper module setup.Module follows coding guidelines with descriptive docstring, proper logger initialization, and absolute imports.
273-349: LGTM! Post-streaming logic correctly implemented.The finalization sequence properly:
- Extracts token usage from the Responses API response object
- Yields end event with metadata
- Stores transcripts conditionally
- Generates topic summaries only for new conversations
- Persists cache and conversation details
The TODO comments at lines 296-297 about RAG chunks align with the PR description noting that referenced documents support will come in a follow-up.
455-457: LGTM! Proper handling of delayed conversation ID.Returning an empty string for
conversation_idis correct for the Responses API streaming flow, where the ID arrives in the firstresponse.createdchunk (lines 158-163). The comment clearly documents this behavior.
441-452: ****The concern about passing
Nonefortoolsis unfounded. Test assertions in the codebase (explicitly verify thattools=Noneis passed to the API and is accepted). The current code—unconditionally including"tools": toolgroups—is correct and tested. No conditional logic is needed.Likely an incorrect or invalid review comment.
5c289ed to
dc74af7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)
617-668: Consider adding error handling for vector store retrieval.The function is well-structured and follows coding guidelines. However, the call to
client.vector_stores.list()at line 647 could potentially fail. Consider wrapping it in a try-except block to handle transient errors gracefully, falling back to no RAG tools if the vector store service is unavailable.Example:
toolgroups = [] # Get vector stores for RAG tools - vector_store_ids = [ - vector_store.id for vector_store in (await client.vector_stores.list()).data - ] + try: + vector_store_ids = [ + vector_store.id for vector_store in (await client.vector_stores.list()).data + ] + except Exception as e: + logger.warning("Failed to retrieve vector stores: %s", e) + vector_store_ids = []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/app/endpoints/query_v2.py(3 hunks)src/app/endpoints/streaming_query.py(6 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/app/routers.py(2 hunks)src/models/context.py(1 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(1 hunks)tests/unit/app/test_routers.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/app/routers.py
🧰 Additional context used
📓 Path-based instructions (9)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/models/context.pysrc/app/endpoints/streaming_query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query_v2.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/models/context.pysrc/app/endpoints/streaming_query.pytests/unit/app/test_routers.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/context.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/context.py
🧬 Code graph analysis (6)
src/app/endpoints/streaming_query_v2.py (9)
src/app/database.py (1)
get_session(34-40)src/app/endpoints/query.py (3)
is_transcripts_enabled(98-104)persist_user_conversation_details(107-139)validate_attachments_metadata(801-830)src/app/endpoints/query_v2.py (4)
extract_token_usage_from_responses_api(451-541)get_topic_summary(235-274)prepare_tools_for_responses_api(617-668)retrieve_response(304-427)src/app/endpoints/streaming_query.py (6)
format_stream_data(126-137)stream_end_event(164-220)stream_start_event(140-161)streaming_query_endpoint_handler_base(851-983)response_generator(716-846)retrieve_response(1018-1139)src/models/cache_entry.py (1)
CacheEntry(7-24)src/models/context.py (1)
ResponseGeneratorContext(12-48)src/models/database/conversations.py (1)
UserConversation(11-38)src/utils/endpoints.py (2)
create_referenced_documents_with_metadata(563-577)get_system_prompt(126-190)src/utils/transcripts.py (1)
store_transcript(40-99)
tests/unit/app/endpoints/test_query_v2.py (2)
src/models/config.py (1)
ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (1)
get_mcp_tools(575-614)
tests/unit/app/endpoints/test_streaming_query_v2.py (3)
src/models/requests.py (1)
QueryRequest(73-225)src/models/config.py (3)
config(140-146)Action(329-375)ModelContextProtocolServer(169-174)src/configuration.py (1)
mcp_servers(101-105)
src/app/endpoints/query_v2.py (2)
src/configuration.py (3)
configuration(73-77)AppConfig(39-181)mcp_servers(101-105)src/models/requests.py (1)
QueryRequest(73-225)
src/models/context.py (1)
src/models/requests.py (1)
QueryRequest(73-225)
src/app/endpoints/streaming_query.py (2)
src/models/context.py (1)
ResponseGeneratorContext(12-48)src/app/endpoints/streaming_query_v2.py (2)
response_generator(120-348)retrieve_response(385-457)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (18)
tests/unit/app/test_routers.py (1)
23-23: LGTM! Router registration updates are correct.The test updates correctly account for the new v2 streaming router: import added, router count incremented to 17, existence assertion added, and prefix verification included.
Also applies to: 69-69, 79-79, 95-95, 105-105
src/app/endpoints/query_v2.py (3)
21-21: LGTM! Import required for new function.The
AppConfigimport is needed for theprepare_tools_for_responses_apifunction signature.
353-355: LGTM! Good refactoring for code reuse.Extracting tool preparation into
prepare_tools_for_responses_apicentralizes the logic for both query and streaming v2 endpoints, and correctly passesmcp_headersto enable per-server authentication.
601-614: LGTM! Header merging logic is correct and flexible.The header construction properly merges token-based authentication with per-server custom headers, allowing server-specific authorization overrides when needed. This addresses the previous review concern about dropped MCP headers.
tests/unit/app/endpoints/test_query_v2.py (1)
60-96: LGTM! Comprehensive test coverage for header merging.The test thoroughly validates the three key scenarios: per-server headers without token, merged token + per-server headers, and Authorization header override. This ensures the header merging logic in
get_mcp_toolsworks correctly.src/models/context.py (1)
1-48: LGTM! Well-designed context object.The
ResponseGeneratorContextdataclass is well-structured with complete docstrings and type hints following coding guidelines. It effectively addresses the previous concern about too many positional arguments by bundling related parameters logically. The grouping of fields (conversation/user context, model/provider info, request/timing, dependencies/state) makes the code more maintainable.tests/unit/app/endpoints/test_streaming_query_v2.py (4)
20-26: LGTM! Well-structured test fixture.The
dummy_requestfixture properly sets up a mock FastAPI Request with authorized actions, which is necessary for RBAC checks in the endpoint handler.
29-56: LGTM! Tests validate core tool preparation logic.Both tests properly verify that
retrieve_responsebuilds the correct tools for the Responses API:test_retrieve_response_builds_rag_and_mcp_toolsconfirms RAG and MCP tools are assembled, andtest_retrieve_response_no_tools_passes_noneensures the no_tools flag is respected.Also applies to: 59-80
83-223: LGTM! Comprehensive integration test for streaming flow.This test validates the complete streaming lifecycle including SSE event generation, conversation persistence, and metric updates. The fake stream covers all major event types (response.created, output_text.delta, function_call, etc.), and assertions verify both event content and ordering.
226-250: LGTM! Error handling test is appropriate.The test correctly verifies that API connection errors are caught, logged, return HTTP 500, and increment the failure metric.
src/app/endpoints/streaming_query.py (4)
8-8: LGTM! Necessary imports for refactoring.The
Callableimport supports the factory function parameters in the base handler, andResponseGeneratorContextenables the parameter reduction pattern.Also applies to: 50-50
700-848: LGTM! Factory function properly encapsulates Agent API streaming logic.The
create_agent_response_generatorfunction effectively bundles the streaming response generation logic, using theResponseGeneratorContextto access all necessary parameters. The inner generator handles SSE formatting, transcript storage, topic summarization, caching, and persistence correctly.
851-983: LGTM! Base handler provides good separation of concerns.The
streaming_query_endpoint_handler_baseextracts all common streaming logic (RBAC, conversation ownership validation, model selection, error handling) while accepting injectable functions for API-specific behavior. This enables code reuse between v1 (Agent API) and v2 (Responses API) endpoints.
986-1015: LGTM! Wrapper delegates to base with Agent API specifics.The
streaming_query_endpoint_handlerfunction is a thin wrapper that supplies Agent API-specific implementations (retrieve_responseandcreate_agent_response_generator) to the base handler, maintaining backward compatibility for the v1 endpoint.src/app/endpoints/streaming_query_v2.py (4)
1-56: LGTM! Module setup follows coding guidelines.The module has a descriptive docstring, proper imports, and appropriate configuration. The router is correctly tagged with "streaming_query_v2" and response schemas are well-defined.
104-350: LGTM! Response generator correctly handles Responses API events.The
create_responses_response_generatorfactory properly processes all Responses API event types:
response.created→ start event with conversation IDresponse.output_text.delta→ token eventsresponse.function_call_arguments.delta→ tool_call eventsresponse.completed→ turn_complete event- Post-streaming: token usage extraction, transcript storage, topic summary, caching, and persistence
The event handling logic is comprehensive and correctly accumulates state across chunks.
353-382: LGTM! Endpoint wrapper follows established pattern.The
streaming_query_endpoint_handler_v2function correctly delegates to the base handler with v2-specific implementations (retrieve_responseandcreate_responses_response_generator), maintaining consistency with the v1 endpoint structure.
385-457: LGTM! Retrieve function properly prepares Responses API call.The
retrieve_responsefunction correctly:
- Validates attachments
- Prepares tools via
prepare_tools_for_responses_api(with mcp_headers)- Concatenates attachment content into input text
- Conditionally builds create parameters to avoid type errors
- Initiates streaming response with
stream=TrueThe conversation ID is returned as empty string since it arrives in the first chunk, which is handled correctly by the generator.
dc74af7 to
a1b6f9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems legit, but would be nice to have @eranco74 ack too. TYVM
Description
Add v2 endpoint to interact with LlamaStack using Responses API (streaming) instead of Agent API
It is adding the support in a v2 endpoint to ensure backward compatibility
This is a follow up of the work done in #753
What is missing and to be done in follow up PRs:
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Tests