Skip to content

Conversation

@luis5tb
Copy link
Contributor

@luis5tb luis5tb commented Nov 3, 2025

Description

Add v2 endpoint to interact with LlamaStack using Responses API (non-streaming) instead of Agent API

It is adding the support in a v2 endpoint to ensure backward compatibility

What is missing and to be done in follow up PRs:

  • Add support for streaming
  • Add support for shields once llamastack supports it for responses API
  • Add support for parsing and returning referenced document on the response

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added a v2 /v2/query endpoint using the Responses API with topic summaries, attachments, RAG (vector search) and external tool (MCP) integrations; v1 behavior preserved via a wrapper.
  • Refactor

    • Consolidated shared query handling into a common base enabling v1/v2 variants via injected callbacks.
  • Routing

    • Registered v2 routes under /v2 alongside existing routes.
  • Observability

    • Captures token-usage metrics, records API error metrics, and improves logging.
  • Tests

    • Added comprehensive unit tests for v2 flows, tools, attachments, usage extraction, and error paths.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 3, 2025

Walkthrough

Splits the original /query handler into an injectable base (query_endpoint_handler_base) and a public v1 wrapper, adds a new v2 /query implementation using the Responses API (query_v2.py), registers the v2 router under /v2, and adds unit tests for the v2 endpoint and helpers.

Changes

Cohort / File(s) Summary
Query endpoint refactor
src/app/endpoints/query.py
Extracts core logic into query_endpoint_handler_base with injectable retrieve_response_func and get_topic_summary_func; adds public wrapper query_endpoint_handler that wires Agent API implementations and preserves the v1 route, auth, RBAC, caching, persistence, and error paths.
New v2 query endpoint & helpers
src/app/endpoints/query_v2.py
New v2 FastAPI router exposing /query under /v2; implements Responses-API-based retrieve_response, get_topic_summary, token-usage extraction, referenced-document parsing (stubbed), and tool builders get_rag_tools / get_mcp_tools; handles attachments, tools construction (RAG/MCP), metrics, response parsing, and error mapping.
Router integration
src/app/routers.py
Registers query_v2.router under the /v2 prefix alongside existing routers.
Unit tests for v2
tests/unit/app/endpoints/test_query_v2.py
Adds tests for get_rag_tools, get_mcp_tools, retrieve_response scenarios (no-tools, RAG/MCP), attachments handling, token usage parsing, tool-call extraction, success and APIConnectionError flows, and metrics updates.
Router tests update
tests/unit/app/test_routers.py
Updates router tests to include query_v2.router, asserts router count increased and validates /v2 prefix presence.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Router
    participant V2 as query_endpoint_handler_v2
    participant Base as query_endpoint_handler_base
    participant Responses as Responses API
    participant Metrics as Prometheus

    Client->>Router: POST /v2/query
    Router->>V2: request + query_request + auth + mcp_headers
    V2->>Base: call base with retrieve_response_func & get_topic_summary_func
    Base->>V2: invoke retrieve_response_func(...)
    V2->>Responses: client.responses.create(input, model, tools, ...)
    Responses-->>V2: OpenAIResponseObject
    V2->>Base: TurnSummary, conversation_id, referenced_docs, token_usage
    Base->>Metrics: update token and call metrics
    Base-->>Router: QueryResponse
    Router-->>Client: 200 OK + QueryResponse
Loading
sequenceDiagram
    participant V1_old as v1 (pre-refactor)
    participant BaseLogic as query_endpoint_handler_base
    participant V1_new as v1 (post-refactor)
    participant V2_new as v2

    Note over V1_old: v1 directly called concrete retrieve/get_topic
    V1_new->>BaseLogic: inject Agent API callbacks (retrieve_response, get_topic_summary)
    V2_new->>BaseLogic: inject Responses API callbacks (v2.retrieve_response, v2.get_topic_summary)
    BaseLogic->>BaseLogic: shared validation, orchestration, persistence, caching, metrics
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Extra attention:
    • Dependency-injection signatures and wrapper wiring in src/app/endpoints/query.py.
    • Tool construction and per-server header handling in src/app/endpoints/query_v2.py::get_mcp_tools and get_rag_tools.
    • Token usage extraction and Prometheus metric updates in src/app/endpoints/query_v2.py::extract_token_usage_from_responses_api.
    • Response parsing and mapping of tool calls to ToolCallSummary.
    • Unit tests simulating APIConnectionError and Responses API outputs in tests/unit/app/endpoints/test_query_v2.py.

Possibly related PRs

Suggested reviewers

  • umago
  • tisnik

Poem

🐇 A hop, a route, a tiny spree,
V1 split neat, V2 speaks with glee.
Tools fetch facts, tokens counted bright,
Attachments tucked, conversations right.
The rabbit cheers — endpoints take flight!

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Base implementation of non-streaming Responses API' accurately describes the main changeset, which introduces a v2 endpoint for the non-streaming Responses API while maintaining backward compatibility with the existing Agent API through dependency injection.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4bff2fa and 9f750cf.

📒 Files selected for processing (3)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
  • tests/unit/app/endpoints/test_query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (2)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to **/*.py : Provide complete type annotations for all function parameters and return types

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (3)
src/app/endpoints/query_v2.py (7)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/responses.py (2)
  • QueryResponse (194-305)
  • ReferencedDocument (179-191)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (304-449)
  • get_topic_summary (235-274)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (574-594)
  • get_mcp_tools (597-630)
  • retrieve_response (304-449)
  • query_endpoint_handler_v2 (279-301)
src/configuration.py (2)
  • mcp_servers (101-105)
  • llama_stack_configuration (87-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: build-pr
🔇 Additional comments (18)
src/app/endpoints/query.py (1)

426-450: LGTM!

The wrapper function properly delegates to the base handler, injecting the Agent API-specific implementations for retrieve_response and get_topic_summary. The docstring clearly explains the purpose and the FastAPI dependency injection is correctly configured.

tests/unit/app/endpoints/test_query_v2.py (6)

1-24: LGTM!

The module setup follows pytest and coding guidelines correctly - includes a descriptive docstring, appropriate imports, and a minimal fixture for FastAPI Request objects.


27-35: LGTM!

The test properly validates both edge cases: empty input returning None and non-empty input producing the correct file_search tool configuration.


38-57: LGTM!

The test thoroughly validates MCP tool generation with and without authentication tokens, ensuring headers are correctly omitted or included based on token availability.


60-344: LGTM!

Comprehensive test suite covering critical retrieve_response scenarios: tool bypassing, RAG/MCP tool construction, output parsing, token usage extraction (object/dict/empty formats), and attachment validation. The tests use appropriate mocking and verify correct behavior across all cases.


347-395: LGTM!

The success path test properly mocks all dependencies and verifies the v2 handler returns the expected QueryResponse structure with correct conversation ID and response text.


398-423: LGTM!

The error path test correctly verifies that APIConnectionError is caught, wrapped in an HTTP 500 response with appropriate detail, and triggers the failure metric increment.

src/app/endpoints/query_v2.py (11)

1-68: LGTM!

The module setup follows coding guidelines correctly: descriptive docstring, appropriate imports using recommended patterns (AsyncLlamaStackClient, FastAPI components), logger initialized with __name__, and clear response schema definition.


71-102: LGTM!

The helper function defensively handles various content formats from the Responses API output, using getattr appropriately to avoid attribute errors and covering multiple text extraction scenarios (direct strings, object attributes, refusals, and dict formats).


104-232: LGTM!

The tool call translation function appropriately handles the complexity of multiple OpenAI Response API output types. The defensive getattr usage prevents attribute errors, and the detailed docstring clearly documents supported tool types. The pylint disables are justified by the inherent complexity of the translation logic.


235-274: LGTM!

The topic summary function correctly uses the Responses API with appropriate parameters (store=False for ephemeral requests). Exception handling gracefully returns an empty string on failure, ensuring robustness. The docstring clearly distinguishes this from the Agent API version.


277-301: LGTM!

The v2 endpoint handler properly wraps the base handler, injecting Responses API-specific implementations. The FastAPI decorators, dependencies, and delegation pattern are correctly implemented, mirroring the v1 structure while using v2 functions.


304-449: LGTM!

The core retrieve_response function correctly implements the Responses API flow. Past review issues have been addressed: toolgroups properly initialized as None, MCP headers threaded through tool construction, previous_response_id conditionally included, and usage accessed defensively with getattr. The complexity justifies the pylint disables, and the comprehensive docstring documents all parameters.


452-470: Acknowledged stub for future implementation.

The function is appropriately marked as a TODO with clear guidance on implementation needs. Since the PR objectives explicitly state that "parsing and returning referenced documents" is planned for subsequent PRs, this stub with detailed TODO comments is acceptable.


473-563: LGTM!

Token usage extraction is implemented defensively with proper error handling. Past review issues have been addressed: usage is accessed via getattr to handle potential absence, and both dict and object formats are supported with clear TODO comments explaining the temporary dual handling. Metrics updates are properly guarded, and the function gracefully handles missing or malformed usage data.


566-571: LGTM!

The helper function safely increments the LLM call metric with appropriate exception handling to prevent metric failures from disrupting the main flow.


574-594: LGTM!

The helper function correctly converts vector store IDs to Responses API file_search tool format, returning None for empty inputs as expected by the caller. The docstring and type annotations are complete.


597-630: LGTM!

The MCP tools helper correctly converts server configurations to Responses API format. Past review issues have been addressed: mcp_headers is now documented in the docstring, and the function properly prioritizes per-request headers over the fallback token for authentication. The logic correctly handles both authentication methods.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)

38-38: Use __name__ when creating the module logger.

Coding guidelines ask us to call logging.getLogger(__name__). Switching to __name__ keeps logger naming consistent with the rest of the stack.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b52ea28 and 395fbaa.

📒 Files selected for processing (5)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/query_v2.py
  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (4)
src/app/endpoints/query.py (4)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/responses.py (1)
  • QueryResponse (194-305)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (144-320)
  • get_topic_summary (70-114)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (36-51)
src/app/endpoints/query_v2.py (9)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/authentication/__init__.py (1)
  • get_auth_dependency (14-52)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/models/responses.py (2)
  • QueryResponse (194-305)
  • ReferencedDocument (179-191)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (438-458)
  • get_mcp_tools (461-477)
  • retrieve_response (144-320)
  • query_endpoint_handler_v2 (119-141)
src/configuration.py (1)
  • mcp_servers (101-105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
🔇 Additional comments (2)
tests/unit/app/test_routers.py (1)

68-103: Router assertions look solid.

Nice job updating the expected router count and asserting the new /v2 prefix—this keeps the routing contract well covered.

tests/unit/app/endpoints/test_query_v2.py (1)

57-405: Excellent coverage of the Responses flow.

These tests exercise the new Responses API path thoroughly, including tool wiring, attachments, and quota handling. Great to see this level of detail.

@luis5tb luis5tb force-pushed the responses_v2_non_streaming branch from 395fbaa to 899c00c Compare November 3, 2025 15:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/app/endpoints/query.py (1)

217-243: Excellent refactoring for v2 API support!

The dependency injection pattern cleanly separates the base query logic from API-specific implementations (Agent API vs. Responses API), enabling code reuse while maintaining backward compatibility.

The type annotations for retrieve_response_func and get_topic_summary_func use Any. While this works, consider using Callable types for better type safety in a future refactor:

from collections.abc import Callable

retrieve_response_func: Callable[
    [AsyncLlamaStackClient, str, QueryRequest, str, dict[str, dict[str, str]] | None, str],
    tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter]
]
get_topic_summary_func: Callable[[str, AsyncLlamaStackClient, str], str]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 395fbaa and 899c00c.

📒 Files selected for processing (5)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/app/test_routers.py
  • src/app/endpoints/query_v2.py
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • tests/unit/app/endpoints/test_query_v2.py
  • src/app/endpoints/query.py
  • src/app/routers.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_query_v2.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query.py
🧠 Learnings (2)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (3)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (433-453)
  • get_mcp_tools (456-472)
  • retrieve_response (144-315)
  • query_endpoint_handler_v2 (119-141)
src/configuration.py (2)
  • mcp_servers (101-105)
  • llama_stack_configuration (87-91)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (144-315)
  • get_topic_summary (70-114)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (36-51)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
🔇 Additional comments (3)
src/app/endpoints/query.py (1)

426-450: LGTM!

The wrapper correctly maintains the v1 /query endpoint's external interface while delegating to the refactored base handler. The routing, authorization, and dependency injection are properly configured, ensuring backward compatibility while enabling the v2 variant.

src/app/routers.py (1)

21-22: LGTM!

The v2 router registration is straightforward and follows the existing routing pattern. The comments clearly document the purpose of the new endpoint for Responses API support.

Also applies to: 46-47

tests/unit/app/endpoints/test_query_v2.py (1)

1-405: Excellent test coverage for the v2 endpoint!

The test suite is comprehensive and well-structured, covering:

  • Tool configuration (RAG and MCP) with and without authentication
  • Response retrieval under various conditions (no tools, with tools, different output formats)
  • Token usage extraction from multiple data formats (object, dict, empty)
  • Attachment validation and inclusion
  • Success and error paths (including API connection failures)

The tests follow pytest best practices with proper fixtures, async support, and clear assertions. This provides strong confidence in the v2 implementation.

@luis5tb luis5tb force-pushed the responses_v2_non_streaming branch from 899c00c to b92e2d7 Compare November 3, 2025 16:00
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
src/app/endpoints/query.py (1)

222-223: Consider using Callable type hints for injected functions.

The injected function parameters are typed as Any, which bypasses type checking. Consider using Callable with appropriate signatures for better type safety:

from typing import Callable, Awaitable

retrieve_response_func: Callable[
    [AsyncLlamaStackClient, str, QueryRequest, str, dict[str, dict[str, str]] | None, str],
    Awaitable[tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter]]
],
get_topic_summary_func: Callable[
    [str, AsyncLlamaStackClient, str],
    Awaitable[str]
],
src/app/endpoints/query_v2.py (3)

230-238: Remove unnecessary cast and parentheses.

The cast(Any, toolgroups) on line 235 is unnecessary since toolgroups is already typed as list[dict[str, Any]] | None, which should be compatible with the expected parameter type. The extra parentheses around the argument are also redundant.

Apply this diff:

     response = await client.responses.create(
         input=input_text,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
-        tools=(cast(Any, toolgroups)),
+        tools=toolgroups,
         stream=False,
         store=True,
     )

339-430: Refactor to reduce duplication and clarify token counting logic.

This function has several improvement opportunities:

  1. Code duplication: The try-except block for metrics.llm_calls_total.labels(provider, model).inc() appears three times (lines 401-405, 412-416, 424-428). Consider extracting this into a helper function or consolidating the logic.

  2. Inconsistent llm_calls tracking: token_counter.llm_calls is set to 1 only when tokens are present (line 379), but the Prometheus llm_calls_total metric is incremented in all cases. This inconsistency could confuse users of TokenCounter. Consider either always setting llm_calls = 1 or documenting why they differ.

Example refactor to address duplication:

def _increment_llm_call_metric(provider: str, model: str) -> None:
    """Helper to safely increment LLM call metric."""
    try:
        metrics.llm_calls_total.labels(provider, model).inc()
    except (AttributeError, TypeError, ValueError) as e:
        logger.warning("Failed to update LLM call metric: %s", e)

# Then replace all three blocks with:
_increment_llm_call_metric(provider, model)

456-472: Improve type hint and complete docstring.

The function has two minor issues:

  1. The mcp_servers parameter is typed as generic list but should be list[ModelContextProtocolServer] for better type safety (the ModelContextProtocolServer type is already imported at line 10).

  2. The docstring lacks Args and Returns sections as required by the coding guidelines for Python functions.

As per coding guidelines

Apply this diff:

-def get_mcp_tools(mcp_servers: list, token: str | None = None) -> list[dict[str, Any]]:
-    """Convert MCP servers to tools format for responses API."""
+def get_mcp_tools(
+    mcp_servers: list[ModelContextProtocolServer], token: str | None = None
+) -> list[dict[str, Any]]:
+    """
+    Convert MCP servers to tools format for Responses API.
+
+    Args:
+        mcp_servers: List of MCP server configurations
+        token: Optional authentication token for MCP server authorization
+
+    Returns:
+        list[dict[str, Any]]: List of MCP tool definitions with server details and optional auth headers
+    """
     tools = []
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 899c00c and b92e2d7.

📒 Files selected for processing (5)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_query_v2.py
  • src/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (4)
src/app/endpoints/query.py (2)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (144-315)
  • get_topic_summary (70-114)
src/app/endpoints/streaming_query.py (1)
  • retrieve_response (942-1063)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (36-51)
tests/unit/app/endpoints/test_query_v2.py (3)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (433-453)
  • get_mcp_tools (456-472)
  • retrieve_response (144-315)
  • query_endpoint_handler_v2 (119-141)
src/app/endpoints/query_v2.py (5)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good work!

# Extract usage from the response if available
if response.usage:
try:
# Handle both dict and object cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just being defensive. I think llamastack should return openai response API object but maybe due to the internal convertion to chat_completions it is returning a dict object. So, assuming that will eventually change I prefer to have it handle and not break then. Let me add a comment about it

@luis5tb luis5tb force-pushed the responses_v2_non_streaming branch from b92e2d7 to 36c5be8 Compare November 4, 2025 10:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/unit/app/endpoints/test_query_v2.py (2)

338-387: Use the shared MOCK_AUTH constant for authentication tuples.

The coding guidelines specify using MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") for test authentication. Line 381 uses a hardcoded tuple instead.

As per coding guidelines

Apply this diff:

+# At the top of the file, after imports
+MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")
+
 @pytest.mark.asyncio
 async def test_query_endpoint_handler_v2_success(mocker, dummy_request):
     ...
     res = await query_endpoint_handler_v2(
         request=dummy_request,
         query_request=QueryRequest(query="hi"),
-        auth=("user123", "", False, "token-abc"),
+        auth=MOCK_AUTH,
         mcp_headers={},
     )

389-414: Use the shared MOCK_AUTH constant for authentication tuples.

Line 408 also uses a hardcoded authentication tuple. For consistency, use the MOCK_AUTH constant as specified in the coding guidelines.

As per coding guidelines

Apply this diff:

     with pytest.raises(HTTPException) as exc:
         await query_endpoint_handler_v2(
             request=dummy_request,
             query_request=QueryRequest(query="hi"),
-            auth=("user123", "", False, "token-abc"),
+            auth=MOCK_AUTH,
             mcp_headers={},
         )
src/app/endpoints/query.py (1)

217-224: Consider using Callable types for injected functions.

Lines 222-223 use Any for the injected function parameters. While this works, using proper Callable types would improve type safety and IDE support.

For example:

from typing import Callable, Awaitable

retrieve_response_func: Callable[
    [AsyncLlamaStackClient, str, QueryRequest, str, dict[str, dict[str, str]] | None, str],
    Awaitable[tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter]]
],
get_topic_summary_func: Callable[
    [str, AsyncLlamaStackClient, str],
    Awaitable[str]
],
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b92e2d7 and 36c5be8.

📒 Files selected for processing (5)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/app/routers.py
  • tests/unit/app/test_routers.py
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • tests/unit/app/endpoints/test_query_v2.py
  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_query_v2.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (3)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (434-454)
  • get_mcp_tools (457-482)
  • retrieve_response (144-314)
  • query_endpoint_handler_v2 (119-141)
src/configuration.py (2)
  • mcp_servers (101-105)
  • llama_stack_configuration (87-91)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (144-314)
  • get_topic_summary (70-114)
src/app/endpoints/query_v2.py (5)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (24)
tests/unit/app/endpoints/test_query_v2.py (10)

1-2: LGTM!

Module docstring follows coding guidelines for test files.


27-36: LGTM!

Test correctly validates both empty and non-empty input cases with appropriate assertions.


38-58: LGTM!

Test thoroughly validates MCP tool construction with and without authentication tokens.


60-94: LGTM!

Test correctly validates that no_tools=True bypasses tool configuration and passes None to the Responses API.


96-137: LGTM!

Test comprehensively validates RAG and MCP tool construction, including proper header inclusion for authentication.


139-187: LGTM!

Test thoroughly validates output parsing for various content formats and tool call extraction.


189-226: LGTM!

Test correctly validates token usage extraction when provided as an object.


228-261: LGTM!

Test validates token usage extraction when provided as a dict, covering the alternate format scenario.


263-296: LGTM!

Test validates graceful handling of empty usage dict, ensuring counters default to zero and LLM calls are still tracked.


298-336: LGTM!

Test correctly validates attachment metadata validation and inclusion in the input string.

src/app/endpoints/query.py (2)

225-246: LGTM!

Documentation clearly explains the base handler's role and the dependency injection pattern.


426-450: LGTM!

The wrapper correctly preserves the v1 endpoint behavior while delegating to the new base handler with Agent API-specific functions.

src/app/endpoints/query_v2.py (12)

1-39: LGTM!

Module docstring and imports follow coding guidelines. Logger name matches the v1 handler for unified logging.


41-67: LGTM!

Response schema definition follows FastAPI conventions and matches v1 for consistency.


70-115: LGTM!

Function correctly implements topic summary generation using the Responses API with proper error handling and content format flexibility.


117-141: LGTM!

Wrapper handler correctly delegates to the base handler with Responses API-specific functions.


144-217: LGTM!

Function correctly initializes toolgroups as None and only builds the list when no_tools is False, properly handling the empty list case. Shield limitation is appropriately documented.


218-248: LGTM!

Input preparation and API call are correctly implemented, with proper conversation chaining and storage configuration.


249-314: LGTM!

Output parsing is defensive and handles multiple content formats gracefully, with appropriate fallbacks for missing attributes.


317-335: LGTM!

Stub implementation is appropriate given the PR objectives indicate referenced document parsing is follow-up work. TODO comment provides clear implementation guidance.


338-423: LGTM!

Token usage extraction comprehensively handles multiple response formats (dict/object) with robust error handling and proper metrics updates.


426-431: LGTM!

Helper function correctly encapsulates metric increment with proper error handling.


434-454: LGTM!

Function correctly converts vector store IDs to Responses API tool format, with appropriate None return for empty input.


457-482: LGTM!

Function correctly builds MCP tool definitions with conditional authorization headers based on token availability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36c5be8 and 79fbdb5.

📒 Files selected for processing (3)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
  • tests/unit/app/endpoints/test_query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (3)
src/app/endpoints/query_v2.py (5)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
src/app/endpoints/query.py (4)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (304-445)
  • get_topic_summary (235-274)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (568-588)
  • get_mcp_tools (591-617)
  • retrieve_response (304-445)
  • query_endpoint_handler_v2 (279-301)
src/configuration.py (2)
  • mcp_servers (101-105)
  • llama_stack_configuration (87-91)
🪛 GitHub Actions: Pyright
src/app/endpoints/query_v2.py

[error] 389-389: No overloads for 'create' match the provided arguments (reportCallIssue).


[error] 393-393: Argument of type 'str | None' cannot be assigned to parameter 'previous_response_id' of type 'str | NotGiven' in function 'create'. Type 'str | None' is not assignable to type 'str | NotGiven' (reportArgumentType).


[error] 498-498: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).


[error] 504-504: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).


[error] 505-505: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).


[error] 506-506: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).


[error] 509-509: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).


[error] 510-510: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).


[error] 543-543: Cannot access attribute 'usage' for class 'OpenAIResponseObject'. Attribute 'usage' is unknown (reportAttributeAccessIssue).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: build-pr

@luis5tb luis5tb force-pushed the responses_v2_non_streaming branch from 79fbdb5 to ce0a379 Compare November 4, 2025 14:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)

46-52: Clarify the multi-line string formatting.

The URL string spanning lines 48-49 relies on Python's implicit string concatenation, which works correctly here. However, for clarity, consider using an explicit backslash continuation or parentheses to make the intent obvious to readers.

Apply this diff to make the formatting more explicit:

         "referenced_documents": [
             {
-                "doc_url": "https://docs.openshift.com/"
-                "container-platform/4.15/operators/olm/index.html",
+                "doc_url": (
+                    "https://docs.openshift.com/"
+                    "container-platform/4.15/operators/olm/index.html"
+                ),
                 "doc_title": "Operator Lifecycle Manager (OLM)",
             }
         ],
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79fbdb5 and ce0a379.

📒 Files selected for processing (3)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • tests/unit/app/endpoints/test_query_v2.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_query_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_query_v2.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
🧬 Code graph analysis (3)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/models/config.py (2)
  • config (140-146)
  • ModelContextProtocolServer (169-174)
src/app/endpoints/query_v2.py (4)
  • get_rag_tools (573-593)
  • get_mcp_tools (596-628)
  • retrieve_response (304-448)
  • query_endpoint_handler_v2 (279-301)
src/configuration.py (2)
  • mcp_servers (101-105)
  • llama_stack_configuration (87-91)
src/app/endpoints/query_v2.py (4)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
src/app/endpoints/query.py (3)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (304-448)
  • get_topic_summary (235-274)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: build-pr
🔇 Additional comments (10)
src/app/endpoints/query_v2.py (5)

235-274: LGTM: Appropriate error handling for topic summary generation.

The broad exception catch with graceful degradation (returning an empty string) is appropriate here since topic summary generation is a non-critical feature that shouldn't block the main query flow. The pylint disable shows this was an intentional design choice.


351-376: LGTM: Correct no_tools handling and tool preparation.

The toolgroups initialization properly handles the no_tools flag by keeping it as None when tools are disabled, and building the tool list only when tools are enabled. The fallback conversion of an empty list to None ensures consistency with the Responses API expectations.


389-400: LGTM: Correct handling of optional previous_response_id.

The implementation correctly builds a kwargs dict and only includes previous_response_id when conversation_id is truthy, avoiding the type error from passing None to the client API. This properly addresses the past Pyright errors.


500-551: LGTM: Safe token usage extraction.

The use of getattr(response, "usage", None) followed by guarding all access behind if usage: properly handles cases where the usage attribute is absent, avoiding AttributeErrors. The dual handling of dict vs object is well-documented as a temporary workaround for Llama Stack inconsistencies.


621-626: LGTM: Proper MCP header prioritization.

The logic correctly prioritizes per-request mcp_headers over the fallback bearer token, ensuring that caller-supplied MCP credentials are honored. This properly addresses the past review concern about dropping custom MCP headers in the v2 handler.

tests/unit/app/endpoints/test_query_v2.py (2)

60-344: LGTM: Comprehensive test coverage for retrieve_response.

The test suite thoroughly covers the key scenarios for retrieve_response: no_tools bypass, RAG/MCP tool building, output parsing, token usage extraction (object, dict, empty), and attachment handling. This provides strong confidence in the implementation's correctness.


347-422: LGTM: Good endpoint handler test coverage.

The tests cover both the success path (with proper response structure verification) and the error path (APIConnectionError resulting in HTTP 500 and metrics increment). The extensive mocking ensures isolated unit testing of the endpoint logic.

src/app/endpoints/query.py (3)

217-223: LGTM: Clean dependency injection refactoring.

The refactoring to query_endpoint_handler_base with injected retrieve_response_func and get_topic_summary_func parameters enables code reuse between the Agent API (v1) and Responses API (v2) implementations. The change of mcp_headers from a Depends to a plain parameter is correct since the base handler is no longer a direct endpoint.


426-450: LGTM: Proper Agent API wrapper.

The query_endpoint_handler wrapper correctly exposes the /query endpoint with FastAPI decorators and dependencies, then delegates to the base handler with the Agent API-specific retrieve_response and get_topic_summary functions. This maintains backward compatibility while enabling the v2 implementation.


297-306: LGTM: Correct usage of injected functions.

The base handler properly invokes the injected retrieve_response_func and get_topic_summary_func at the appropriate points in the flow, passing all necessary parameters. The dependency injection pattern is correctly implemented.

Also applies to: 315-317

@luis5tb luis5tb force-pushed the responses_v2_non_streaming branch from ce0a379 to 4bff2fa Compare November 4, 2025 14:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/app/endpoints/query_v2.py (1)

393-393: Consider removing unnecessary cast(Any, toolgroups).

The toolgroups variable is already properly typed as list[dict[str, Any]] | None, so the cast(Any, toolgroups) on line 393 may no longer be necessary. If the client.responses.create method accepts list[dict[str, Any]] | None for the tools parameter, you can pass toolgroups directly.

If the cast is needed due to client library type definitions, consider adding a comment explaining why:

# Cast needed due to client library type signature
tools=cast(Any, toolgroups),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce0a379 and 4bff2fa.

📒 Files selected for processing (3)
  • src/app/endpoints/query.py (4 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/app/endpoints/test_query_v2.py
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient

Applied to files:

  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
🧬 Code graph analysis (2)
src/app/endpoints/query.py (2)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/app/endpoints/query_v2.py (2)
  • retrieve_response (303-448)
  • get_topic_summary (235-274)
src/app/endpoints/query_v2.py (5)
src/app/endpoints/query.py (4)
  • query_endpoint_handler_base (217-423)
  • validate_attachments_metadata (801-830)
  • get_topic_summary (184-214)
  • retrieve_response (639-798)
src/utils/endpoints.py (2)
  • get_system_prompt (126-190)
  • get_topic_summary_system_prompt (193-204)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/token_counter.py (1)
  • TokenCounter (18-41)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
🪛 GitHub Actions: Black
src/app/endpoints/query_v2.py

[error] 1-1: Black formatting check failed. 1 file would be reformatted by Black (src/app/endpoints/query_v2.py). Run 'uv tool run black --write .' to fix.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
🔇 Additional comments (2)
src/app/endpoints/query.py (1)

426-450: LGTM!

The refactoring successfully separates the shared query handling logic into a base handler while maintaining the public API. The wrapper correctly provides Agent API-specific implementations and properly handles dependency injection for MCP headers.

src/app/endpoints/query_v2.py (1)

39-39: Use __name__ for logger initialization.

The logger is initialized with a hardcoded string "app.endpoints.handlers" instead of __name__, which violates the coding guideline and creates confusion since this module is query_v2.py, not the base query.py module.

As per coding guidelines

Apply this diff:

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
⛔ Skipped due to learnings
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to **/*.py : Use logger = logging.getLogger(__name__) for module logging after import logging

Comment on lines +222 to +223
retrieve_response_func: Any,
get_topic_summary_func: Any,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use explicit Callable types instead of Any for function parameters.

The retrieve_response_func and get_topic_summary_func parameters are typed as Any, which violates the coding guideline requiring complete type annotations for all function parameters. This reduces type safety and makes the dependency injection pattern less clear.

As per coding guidelines

Apply this diff to add proper type hints:

+from typing import Annotated, Any, Callable, Optional, cast
+
 async def query_endpoint_handler_base(  # pylint: disable=R0914
     request: Request,
     query_request: QueryRequest,
     auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
     mcp_headers: dict[str, dict[str, str]],
-    retrieve_response_func: Any,
-    get_topic_summary_func: Any,
+    retrieve_response_func: Callable[
+        [AsyncLlamaStackClient, str, QueryRequest, str, dict[str, dict[str, str]] | None, str],
+        tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter]
+    ],
+    get_topic_summary_func: Callable[[str, AsyncLlamaStackClient, str], str],
 ) -> QueryResponse:

Note: The retrieve_response_func signature matches the actual function signature with positional provider_id parameter.

🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 222-223, replace the Any types for
retrieve_response_func and get_topic_summary_func with explicit typing: import
Callable and Awaitable (and any specific model/response types used) and annotate
retrieve_response_func as a Callable whose first positional parameter is
provider_id (match the actual function parameter order and types) and whose
return is the appropriate Awaitable/return type; likewise annotate
get_topic_summary_func with its exact parameter types and return type. Ensure
the signatures exactly mirror the real functions (positional provider_id first
for retrieve_response_func), update imports accordingly, and run type checks.

Also include other formatting/checking fixes
@luis5tb luis5tb force-pushed the responses_v2_non_streaming branch from 4bff2fa to 9f750cf Compare November 4, 2025 21:41
@eranco74
Copy link
Contributor

eranco74 commented Nov 5, 2025

/lgtm

@tisnik tisnik merged commit d2555b9 into lightspeed-core:main Nov 5, 2025
20 of 22 checks passed
@luis5tb luis5tb mentioned this pull request Nov 5, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants