-
Notifications
You must be signed in to change notification settings - Fork 49
Base implementation of non-streaming Responses API #753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base implementation of non-streaming Responses API #753
Conversation
WalkthroughSplits the original /query handler into an injectable base ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Router
participant V2 as query_endpoint_handler_v2
participant Base as query_endpoint_handler_base
participant Responses as Responses API
participant Metrics as Prometheus
Client->>Router: POST /v2/query
Router->>V2: request + query_request + auth + mcp_headers
V2->>Base: call base with retrieve_response_func & get_topic_summary_func
Base->>V2: invoke retrieve_response_func(...)
V2->>Responses: client.responses.create(input, model, tools, ...)
Responses-->>V2: OpenAIResponseObject
V2->>Base: TurnSummary, conversation_id, referenced_docs, token_usage
Base->>Metrics: update token and call metrics
Base-->>Router: QueryResponse
Router-->>Client: 200 OK + QueryResponse
sequenceDiagram
participant V1_old as v1 (pre-refactor)
participant BaseLogic as query_endpoint_handler_base
participant V1_new as v1 (post-refactor)
participant V2_new as v2
Note over V1_old: v1 directly called concrete retrieve/get_topic
V1_new->>BaseLogic: inject Agent API callbacks (retrieve_response, get_topic_summary)
V2_new->>BaseLogic: inject Responses API callbacks (v2.retrieve_response, v2.get_topic_summary)
BaseLogic->>BaseLogic: shared validation, orchestration, persistence, caching, metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used📓 Path-based instructions (7)src/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/app/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/{app/**/*.py,client.py}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/app/endpoints/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/{unit,integration}/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (2)📚 Learning: 2025-09-18T16:46:33.353ZApplied to files:
📚 Learning: 2025-09-18T16:46:33.353ZApplied to files:
🧬 Code graph analysis (3)src/app/endpoints/query_v2.py (7)
src/app/endpoints/query.py (2)
tests/unit/app/endpoints/test_query_v2.py (4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (18)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)
38-38: Use__name__when creating the module logger.Coding guidelines ask us to call
logging.getLogger(__name__). Switching to__name__keeps logger naming consistent with the rest of the stack.-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)src/app/routers.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/test_routers.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query.pysrc/app/routers.pysrc/app/endpoints/query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query.pysrc/app/routers.pysrc/app/endpoints/query_v2.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/query.pysrc/app/routers.pysrc/app/endpoints/query_v2.pytests/unit/app/test_routers.pytests/unit/app/endpoints/test_query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query.pysrc/app/routers.pysrc/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (4)
src/app/endpoints/query.py (4)
src/models/requests.py (1)
QueryRequest(73-225)src/models/responses.py (1)
QueryResponse(194-305)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/app/endpoints/query_v2.py (2)
retrieve_response(144-320)get_topic_summary(70-114)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(36-51)
src/app/endpoints/query_v2.py (9)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/authentication/__init__.py (1)
get_auth_dependency(14-52)src/authorization/middleware.py (1)
authorize(111-122)src/models/requests.py (1)
QueryRequest(73-225)src/models/responses.py (2)
QueryResponse(194-305)ReferencedDocument(179-191)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(438-458)get_mcp_tools(461-477)retrieve_response(144-320)query_endpoint_handler_v2(119-141)src/configuration.py (1)
mcp_servers(101-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (2)
tests/unit/app/test_routers.py (1)
68-103: Router assertions look solid.Nice job updating the expected router count and asserting the new
/v2prefix—this keeps the routing contract well covered.tests/unit/app/endpoints/test_query_v2.py (1)
57-405: Excellent coverage of the Responses flow.These tests exercise the new Responses API path thoroughly, including tool wiring, attachments, and quota handling. Great to see this level of detail.
395fbaa to
899c00c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/app/endpoints/query.py (1)
217-243: Excellent refactoring for v2 API support!The dependency injection pattern cleanly separates the base query logic from API-specific implementations (Agent API vs. Responses API), enabling code reuse while maintaining backward compatibility.
The type annotations for
retrieve_response_funcandget_topic_summary_funcuseAny. While this works, consider usingCallabletypes for better type safety in a future refactor:from collections.abc import Callable retrieve_response_func: Callable[ [AsyncLlamaStackClient, str, QueryRequest, str, dict[str, dict[str, str]] | None, str], tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter] ] get_topic_summary_func: Callable[[str, AsyncLlamaStackClient, str], str]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)src/app/routers.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/test_routers.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/unit/app/test_routers.py
- src/app/endpoints/query_v2.py
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/app/endpoints/test_query_v2.pysrc/app/endpoints/query.pysrc/app/routers.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query.pysrc/app/routers.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query.pysrc/app/routers.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query.pysrc/app/routers.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query.py
🧠 Learnings (2)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (3)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(433-453)get_mcp_tools(456-472)retrieve_response(144-315)query_endpoint_handler_v2(119-141)src/configuration.py (2)
mcp_servers(101-105)llama_stack_configuration(87-91)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/app/endpoints/query_v2.py (2)
retrieve_response(144-315)get_topic_summary(70-114)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(36-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (3)
src/app/endpoints/query.py (1)
426-450: LGTM!The wrapper correctly maintains the v1
/queryendpoint's external interface while delegating to the refactored base handler. The routing, authorization, and dependency injection are properly configured, ensuring backward compatibility while enabling the v2 variant.src/app/routers.py (1)
21-22: LGTM!The v2 router registration is straightforward and follows the existing routing pattern. The comments clearly document the purpose of the new endpoint for Responses API support.
Also applies to: 46-47
tests/unit/app/endpoints/test_query_v2.py (1)
1-405: Excellent test coverage for the v2 endpoint!The test suite is comprehensive and well-structured, covering:
- Tool configuration (RAG and MCP) with and without authentication
- Response retrieval under various conditions (no tools, with tools, different output formats)
- Token usage extraction from multiple data formats (object, dict, empty)
- Attachment validation and inclusion
- Success and error paths (including API connection failures)
The tests follow pytest best practices with proper fixtures, async support, and clear assertions. This provides strong confidence in the v2 implementation.
899c00c to
b92e2d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/app/endpoints/query.py (1)
222-223: Consider using Callable type hints for injected functions.The injected function parameters are typed as
Any, which bypasses type checking. Consider usingCallablewith appropriate signatures for better type safety:from typing import Callable, Awaitable retrieve_response_func: Callable[ [AsyncLlamaStackClient, str, QueryRequest, str, dict[str, dict[str, str]] | None, str], Awaitable[tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter]] ], get_topic_summary_func: Callable[ [str, AsyncLlamaStackClient, str], Awaitable[str] ],src/app/endpoints/query_v2.py (3)
230-238: Remove unnecessary cast and parentheses.The
cast(Any, toolgroups)on line 235 is unnecessary sincetoolgroupsis already typed aslist[dict[str, Any]] | None, which should be compatible with the expected parameter type. The extra parentheses around the argument are also redundant.Apply this diff:
response = await client.responses.create( input=input_text, model=model_id, instructions=system_prompt, previous_response_id=query_request.conversation_id, - tools=(cast(Any, toolgroups)), + tools=toolgroups, stream=False, store=True, )
339-430: Refactor to reduce duplication and clarify token counting logic.This function has several improvement opportunities:
Code duplication: The try-except block for
metrics.llm_calls_total.labels(provider, model).inc()appears three times (lines 401-405, 412-416, 424-428). Consider extracting this into a helper function or consolidating the logic.Inconsistent llm_calls tracking:
token_counter.llm_callsis set to 1 only when tokens are present (line 379), but the Prometheusllm_calls_totalmetric is incremented in all cases. This inconsistency could confuse users ofTokenCounter. Consider either always settingllm_calls = 1or documenting why they differ.Example refactor to address duplication:
def _increment_llm_call_metric(provider: str, model: str) -> None: """Helper to safely increment LLM call metric.""" try: metrics.llm_calls_total.labels(provider, model).inc() except (AttributeError, TypeError, ValueError) as e: logger.warning("Failed to update LLM call metric: %s", e) # Then replace all three blocks with: _increment_llm_call_metric(provider, model)
456-472: Improve type hint and complete docstring.The function has two minor issues:
The
mcp_serversparameter is typed as genericlistbut should belist[ModelContextProtocolServer]for better type safety (theModelContextProtocolServertype is already imported at line 10).The docstring lacks Args and Returns sections as required by the coding guidelines for Python functions.
As per coding guidelines
Apply this diff:
-def get_mcp_tools(mcp_servers: list, token: str | None = None) -> list[dict[str, Any]]: - """Convert MCP servers to tools format for responses API.""" +def get_mcp_tools( + mcp_servers: list[ModelContextProtocolServer], token: str | None = None +) -> list[dict[str, Any]]: + """ + Convert MCP servers to tools format for Responses API. + + Args: + mcp_servers: List of MCP server configurations + token: Optional authentication token for MCP server authorization + + Returns: + list[dict[str, Any]]: List of MCP tool definitions with server details and optional auth headers + """ tools = []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)src/app/routers.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/test_routers.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query.pysrc/app/routers.pysrc/app/endpoints/query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query.pysrc/app/routers.pysrc/app/endpoints/query_v2.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/query.pysrc/app/routers.pytests/unit/app/test_routers.pytests/unit/app/endpoints/test_query_v2.pysrc/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query.pysrc/app/routers.pysrc/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/test_routers.pytests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (4)
src/app/endpoints/query.py (2)
src/app/endpoints/query_v2.py (2)
retrieve_response(144-315)get_topic_summary(70-114)src/app/endpoints/streaming_query.py (1)
retrieve_response(942-1063)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(36-51)
tests/unit/app/endpoints/test_query_v2.py (3)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(433-453)get_mcp_tools(456-472)retrieve_response(144-315)query_endpoint_handler_v2(119-141)
src/app/endpoints/query_v2.py (5)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good work!
src/app/endpoints/query_v2.py
Outdated
| # Extract usage from the response if available | ||
| if response.usage: | ||
| try: | ||
| # Handle both dict and object cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just being defensive. I think llamastack should return openai response API object but maybe due to the internal convertion to chat_completions it is returning a dict object. So, assuming that will eventually change I prefer to have it handle and not break then. Let me add a comment about it
b92e2d7 to
36c5be8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/unit/app/endpoints/test_query_v2.py (2)
338-387: Use the shared MOCK_AUTH constant for authentication tuples.The coding guidelines specify using
MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")for test authentication. Line 381 uses a hardcoded tuple instead.As per coding guidelines
Apply this diff:
+# At the top of the file, after imports +MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") + @pytest.mark.asyncio async def test_query_endpoint_handler_v2_success(mocker, dummy_request): ... res = await query_endpoint_handler_v2( request=dummy_request, query_request=QueryRequest(query="hi"), - auth=("user123", "", False, "token-abc"), + auth=MOCK_AUTH, mcp_headers={}, )
389-414: Use the shared MOCK_AUTH constant for authentication tuples.Line 408 also uses a hardcoded authentication tuple. For consistency, use the
MOCK_AUTHconstant as specified in the coding guidelines.As per coding guidelines
Apply this diff:
with pytest.raises(HTTPException) as exc: await query_endpoint_handler_v2( request=dummy_request, query_request=QueryRequest(query="hi"), - auth=("user123", "", False, "token-abc"), + auth=MOCK_AUTH, mcp_headers={}, )src/app/endpoints/query.py (1)
217-224: Consider using Callable types for injected functions.Lines 222-223 use
Anyfor the injected function parameters. While this works, using properCallabletypes would improve type safety and IDE support.For example:
from typing import Callable, Awaitable retrieve_response_func: Callable[ [AsyncLlamaStackClient, str, QueryRequest, str, dict[str, dict[str, str]] | None, str], Awaitable[tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter]] ], get_topic_summary_func: Callable[ [str, AsyncLlamaStackClient, str], Awaitable[str] ],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)src/app/routers.py(3 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)tests/unit/app/test_routers.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/app/routers.py
- tests/unit/app/test_routers.py
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/app/endpoints/test_query_v2.pysrc/app/endpoints/query.pysrc/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (3)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(434-454)get_mcp_tools(457-482)retrieve_response(144-314)query_endpoint_handler_v2(119-141)src/configuration.py (2)
mcp_servers(101-105)llama_stack_configuration(87-91)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/app/endpoints/query_v2.py (2)
retrieve_response(144-314)get_topic_summary(70-114)
src/app/endpoints/query_v2.py (5)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (24)
tests/unit/app/endpoints/test_query_v2.py (10)
1-2: LGTM!Module docstring follows coding guidelines for test files.
27-36: LGTM!Test correctly validates both empty and non-empty input cases with appropriate assertions.
38-58: LGTM!Test thoroughly validates MCP tool construction with and without authentication tokens.
60-94: LGTM!Test correctly validates that
no_tools=Truebypasses tool configuration and passesNoneto the Responses API.
96-137: LGTM!Test comprehensively validates RAG and MCP tool construction, including proper header inclusion for authentication.
139-187: LGTM!Test thoroughly validates output parsing for various content formats and tool call extraction.
189-226: LGTM!Test correctly validates token usage extraction when provided as an object.
228-261: LGTM!Test validates token usage extraction when provided as a dict, covering the alternate format scenario.
263-296: LGTM!Test validates graceful handling of empty usage dict, ensuring counters default to zero and LLM calls are still tracked.
298-336: LGTM!Test correctly validates attachment metadata validation and inclusion in the input string.
src/app/endpoints/query.py (2)
225-246: LGTM!Documentation clearly explains the base handler's role and the dependency injection pattern.
426-450: LGTM!The wrapper correctly preserves the v1 endpoint behavior while delegating to the new base handler with Agent API-specific functions.
src/app/endpoints/query_v2.py (12)
1-39: LGTM!Module docstring and imports follow coding guidelines. Logger name matches the v1 handler for unified logging.
41-67: LGTM!Response schema definition follows FastAPI conventions and matches v1 for consistency.
70-115: LGTM!Function correctly implements topic summary generation using the Responses API with proper error handling and content format flexibility.
117-141: LGTM!Wrapper handler correctly delegates to the base handler with Responses API-specific functions.
144-217: LGTM!Function correctly initializes toolgroups as
Noneand only builds the list whenno_toolsisFalse, properly handling the empty list case. Shield limitation is appropriately documented.
218-248: LGTM!Input preparation and API call are correctly implemented, with proper conversation chaining and storage configuration.
249-314: LGTM!Output parsing is defensive and handles multiple content formats gracefully, with appropriate fallbacks for missing attributes.
317-335: LGTM!Stub implementation is appropriate given the PR objectives indicate referenced document parsing is follow-up work. TODO comment provides clear implementation guidance.
338-423: LGTM!Token usage extraction comprehensively handles multiple response formats (dict/object) with robust error handling and proper metrics updates.
426-431: LGTM!Helper function correctly encapsulates metric increment with proper error handling.
434-454: LGTM!Function correctly converts vector store IDs to Responses API tool format, with appropriate None return for empty input.
457-482: LGTM!Function correctly builds MCP tool definitions with conditional authorization headers based on token availability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.pytests/unit/app/endpoints/test_query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (3)
src/app/endpoints/query_v2.py (5)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
src/app/endpoints/query.py (4)
src/models/requests.py (1)
QueryRequest(73-225)src/authorization/middleware.py (1)
authorize(111-122)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/app/endpoints/query_v2.py (2)
retrieve_response(304-445)get_topic_summary(235-274)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(568-588)get_mcp_tools(591-617)retrieve_response(304-445)query_endpoint_handler_v2(279-301)src/configuration.py (2)
mcp_servers(101-105)llama_stack_configuration(87-91)
🪛 GitHub Actions: Pyright
src/app/endpoints/query_v2.py
[error] 389-389: No overloads for 'create' match the provided arguments (reportCallIssue).
[error] 393-393: Argument of type 'str | None' cannot be assigned to parameter 'previous_response_id' of type 'str | NotGiven' in function 'create'. Type 'str | None' is not assignable to type 'str | NotGiven' (reportArgumentType).
[error] 498-498: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).
[error] 504-504: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).
[error] 505-505: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).
[error] 506-506: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).
[error] 509-509: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).
[error] 510-510: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).
[error] 543-543: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
- GitHub Check: build-pr
79fbdb5 to
ce0a379
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)
46-52: Clarify the multi-line string formatting.The URL string spanning lines 48-49 relies on Python's implicit string concatenation, which works correctly here. However, for clarity, consider using an explicit backslash continuation or parentheses to make the intent obvious to readers.
Apply this diff to make the formatting more explicit:
"referenced_documents": [ { - "doc_url": "https://docs.openshift.com/" - "container-platform/4.15/operators/olm/index.html", + "doc_url": ( + "https://docs.openshift.com/" + "container-platform/4.15/operators/olm/index.html" + ), "doc_title": "Operator Lifecycle Manager (OLM)", } ],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/app/endpoints/test_query_v2.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query_v2.pysrc/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (3)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
QueryRequest(73-225)Attachment(16-70)src/models/config.py (2)
config(140-146)ModelContextProtocolServer(169-174)src/app/endpoints/query_v2.py (4)
get_rag_tools(573-593)get_mcp_tools(596-628)retrieve_response(304-448)query_endpoint_handler_v2(279-301)src/configuration.py (2)
mcp_servers(101-105)llama_stack_configuration(87-91)
src/app/endpoints/query_v2.py (4)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
src/app/endpoints/query.py (3)
src/authorization/middleware.py (1)
authorize(111-122)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/app/endpoints/query_v2.py (2)
retrieve_response(304-448)get_topic_summary(235-274)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
- GitHub Check: build-pr
🔇 Additional comments (10)
src/app/endpoints/query_v2.py (5)
235-274: LGTM: Appropriate error handling for topic summary generation.The broad exception catch with graceful degradation (returning an empty string) is appropriate here since topic summary generation is a non-critical feature that shouldn't block the main query flow. The pylint disable shows this was an intentional design choice.
351-376: LGTM: Correct no_tools handling and tool preparation.The toolgroups initialization properly handles the
no_toolsflag by keeping it asNonewhen tools are disabled, and building the tool list only when tools are enabled. The fallback conversion of an empty list toNoneensures consistency with the Responses API expectations.
389-400: LGTM: Correct handling of optional previous_response_id.The implementation correctly builds a kwargs dict and only includes
previous_response_idwhenconversation_idis truthy, avoiding the type error from passingNoneto the client API. This properly addresses the past Pyright errors.
500-551: LGTM: Safe token usage extraction.The use of
getattr(response, "usage", None)followed by guarding all access behindif usage:properly handles cases where the usage attribute is absent, avoiding AttributeErrors. The dual handling of dict vs object is well-documented as a temporary workaround for Llama Stack inconsistencies.
621-626: LGTM: Proper MCP header prioritization.The logic correctly prioritizes per-request
mcp_headersover the fallback bearer token, ensuring that caller-supplied MCP credentials are honored. This properly addresses the past review concern about dropping custom MCP headers in the v2 handler.tests/unit/app/endpoints/test_query_v2.py (2)
60-344: LGTM: Comprehensive test coverage for retrieve_response.The test suite thoroughly covers the key scenarios for
retrieve_response: no_tools bypass, RAG/MCP tool building, output parsing, token usage extraction (object, dict, empty), and attachment handling. This provides strong confidence in the implementation's correctness.
347-422: LGTM: Good endpoint handler test coverage.The tests cover both the success path (with proper response structure verification) and the error path (APIConnectionError resulting in HTTP 500 and metrics increment). The extensive mocking ensures isolated unit testing of the endpoint logic.
src/app/endpoints/query.py (3)
217-223: LGTM: Clean dependency injection refactoring.The refactoring to
query_endpoint_handler_basewith injectedretrieve_response_funcandget_topic_summary_funcparameters enables code reuse between the Agent API (v1) and Responses API (v2) implementations. The change ofmcp_headersfrom aDependsto a plain parameter is correct since the base handler is no longer a direct endpoint.
426-450: LGTM: Proper Agent API wrapper.The
query_endpoint_handlerwrapper correctly exposes the/queryendpoint with FastAPI decorators and dependencies, then delegates to the base handler with the Agent API-specificretrieve_responseandget_topic_summaryfunctions. This maintains backward compatibility while enabling the v2 implementation.
297-306: LGTM: Correct usage of injected functions.The base handler properly invokes the injected
retrieve_response_funcandget_topic_summary_funcat the appropriate points in the flow, passing all necessary parameters. The dependency injection pattern is correctly implemented.Also applies to: 315-317
ce0a379 to
4bff2fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)
393-393: Consider removing unnecessarycast(Any, toolgroups).The
toolgroupsvariable is already properly typed aslist[dict[str, Any]] | None, so thecast(Any, toolgroups)on line 393 may no longer be necessary. If theclient.responses.createmethod acceptslist[dict[str, Any]] | Nonefor thetoolsparameter, you can passtoolgroupsdirectly.If the cast is needed due to client library type definitions, consider adding a comment explaining why:
# Cast needed due to client library type signature tools=cast(Any, toolgroups),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/query_v2.py(1 hunks)tests/unit/app/endpoints/test_query_v2.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/app/endpoints/test_query_v2.py
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.pysrc/app/endpoints/query_v2.py
🧬 Code graph analysis (2)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/app/endpoints/query_v2.py (2)
retrieve_response(303-448)get_topic_summary(235-274)
src/app/endpoints/query_v2.py (5)
src/app/endpoints/query.py (4)
query_endpoint_handler_base(217-423)validate_attachments_metadata(801-830)get_topic_summary(184-214)retrieve_response(639-798)src/utils/endpoints.py (2)
get_system_prompt(126-190)get_topic_summary_system_prompt(193-204)src/utils/mcp_headers.py (1)
mcp_headers_dependency(15-26)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (2)
TurnSummary(89-163)ToolCallSummary(73-86)
🪛 GitHub Actions: Black
src/app/endpoints/query_v2.py
[error] 1-1: Black formatting check failed. 1 file would be reformatted by Black (src/app/endpoints/query_v2.py). Run 'uv tool run black --write .' to fix.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
- GitHub Check: e2e_tests (azure)
🔇 Additional comments (2)
src/app/endpoints/query.py (1)
426-450: LGTM!The refactoring successfully separates the shared query handling logic into a base handler while maintaining the public API. The wrapper correctly provides Agent API-specific implementations and properly handles dependency injection for MCP headers.
src/app/endpoints/query_v2.py (1)
39-39: Use__name__for logger initialization.The logger is initialized with a hardcoded string
"app.endpoints.handlers"instead of__name__, which violates the coding guideline and creates confusion since this module isquery_v2.py, not the basequery.pymodule.As per coding guidelines
Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)⛔ Skipped due to learnings
Learnt from: luis5tb Repo: lightspeed-core/lightspeed-stack PR: 727 File: src/app/endpoints/a2a.py:43-43 Timestamp: 2025-10-29T13:05:22.438Z Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).Learnt from: CR Repo: lightspeed-core/lightspeed-stack PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-09-18T16:46:33.353Z Learning: Applies to **/*.py : Use logger = logging.getLogger(__name__) for module logging after import logging
| retrieve_response_func: Any, | ||
| get_topic_summary_func: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use explicit Callable types instead of Any for function parameters.
The retrieve_response_func and get_topic_summary_func parameters are typed as Any, which violates the coding guideline requiring complete type annotations for all function parameters. This reduces type safety and makes the dependency injection pattern less clear.
As per coding guidelines
Apply this diff to add proper type hints:
+from typing import Annotated, Any, Callable, Optional, cast
+
async def query_endpoint_handler_base( # pylint: disable=R0914
request: Request,
query_request: QueryRequest,
auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
mcp_headers: dict[str, dict[str, str]],
- retrieve_response_func: Any,
- get_topic_summary_func: Any,
+ retrieve_response_func: Callable[
+ [AsyncLlamaStackClient, str, QueryRequest, str, dict[str, dict[str, str]] | None, str],
+ tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter]
+ ],
+ get_topic_summary_func: Callable[[str, AsyncLlamaStackClient, str], str],
) -> QueryResponse:Note: The retrieve_response_func signature matches the actual function signature with positional provider_id parameter.
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 222-223, replace the Any types for
retrieve_response_func and get_topic_summary_func with explicit typing: import
Callable and Awaitable (and any specific model/response types used) and annotate
retrieve_response_func as a Callable whose first positional parameter is
provider_id (match the actual function parameter order and types) and whose
return is the appropriate Awaitable/return type; likewise annotate
get_topic_summary_func with its exact parameter types and return type. Ensure
the signatures exactly mirror the real functions (positional provider_id first
for retrieve_response_func), update imports accordingly, and run type checks.
Also include other formatting/checking fixes
4bff2fa to
9f750cf
Compare
|
/lgtm |
Description
Add v2 endpoint to interact with LlamaStack using Responses API (non-streaming) instead of Agent API
It is adding the support in a v2 endpoint to ensure backward compatibility
What is missing and to be done in follow up PRs:
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Refactor
Routing
Observability
Tests