Skip to content

Conversation

@luis5tb
Copy link
Contributor

@luis5tb luis5tb commented Sep 15, 2025

Description

Add v2 endpoint to interact with LlamaStack using Responses API instead of Agent API

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

You need changes at LlamaStack (llamastack/llama-stack#3385) to be able to test this. I was using assisted-chat to test the end-to-end solution. I had to create new scripts to leverage the new v2 (make query_v2 or make streaming_query_v2)

Summary by CodeRabbit

  • New Features

    • Added v2 query endpoints (standard and streaming) with RBAC-protected access, streaming SSE support, RAG and MCP tool integration, attachments validation, and conversation persistence.
    • Routing updated to expose v2 endpoints alongside v1/v3.
  • Bug Fixes

    • Improved error handling with clearer 500 responses when the Responses service is unreachable; failure metrics incremented.
  • Tests

    • Added comprehensive unit tests for v2 query and streaming flows.
  • Refactor

    • ID validation broadened to accept response-style IDs as well as UUIDs; vector store naming unified across retrieval paths.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

Walkthrough

Adds new v2 query endpoints: synchronous POST /v2/query and streaming POST /v2/streaming_query. Both authenticate with RBAC, resolve model/provider, assemble RAG and MCP tools, call the Responses API (sync or streaming), parse tool calls, persist transcripts/conversations, update metrics, and handle APIConnectionError.

Changes

Cohort / File(s) Summary
V2 endpoints: sync and streaming
src/app/endpoints/query_v2.py, src/app/endpoints/streaming_query_v2.py
Add new v2 handlers and routers implementing POST /v2/query and /v2/streaming_query. Implement auth/RBAC, model/provider resolution, system prompt and attachment validation, RAG and MCP tool construction (including Authorization header when token present), sync and streaming Responses API invocation, parsing of LLM output and tool calls into TurnSummary/ToolCallSummary, conversation/transcript persistence, metrics updates, and APIConnectionError handling.
Router wiring
src/app/routers.py
Register and include the new v2 routers (query_v2, streaming_query_v2) under /v2; add v3 conversations router and adjust router registrations and tests accordingly.
RAG / MCP tool helpers
src/app/endpoints/query_v2.py (exports: get_rag_tools, get_mcp_tools)
Implement helpers converting vector store IDs into a file_search tool and MCP server entries into mcp tools, optionally attaching Authorization header when token provided.
SUID utility
src/utils/suid.py
Extend check_suid to accept Response API IDs prefixed with resp- or resp_ by stripping/padding to UUID length and validating; add input validation for non-string/empty inputs.
Refactor: existing query/streaming endpoints
src/app/endpoints/query.py, src/app/endpoints/streaming_query.py
Replace vector_dbs usage with vector_stores and rename vector ID parameters to vector_store_ids; update RAG toolgroup calls and related argument names.
Tests: query v2
tests/unit/app/endpoints/test_query_v2.py
Add unit tests for get_rag_tools, get_mcp_tools, retrieve_response parsing/behavior, attachment validation, success and APIConnectionError paths, conversation propagation, and metrics increments.
Tests: streaming query v2
tests/unit/app/endpoints/test_streaming_query_v2.py
Add tests covering streaming retrieve_response tool assembly, stream=True behavior, SSE event sequencing, persistence, metrics, and APIConnectionError handling.
Tests: routers
tests/unit/app/test_routers.py
Update tests to expect inclusion of v2 routers (query_v2, streaming_query_v2), increased router count, and /v2 prefixes for the new endpoints.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant R as /v2/query Router
  participant H as query_endpoint_handler_v2
  participant A as Auth/RBAC
  participant M as Model Resolver
  participant L as AsyncLlamaStackClient
  participant S as Responses API
  participant P as Persistence
  participant X as Metrics

  C->>R: POST /v2/query (QueryRequest)
  R->>H: Dispatch
  H->>A: Authenticate & authorize
  H->>M: Resolve provider/model
  H->>H: Build system prompt + validate attachments
  H->>H: Build tools (RAG + MCP)
  H->>L: client.responses.create(input, model, tools, ...)
  L->>S: Create response
  S-->>L: Response (text + tool_calls, conv_id)
  L-->>H: Parsed result (TurnSummary)
  H->>X: Increment llm_calls_total (or failures)
  H->>P: Persist transcript/conversation
  H-->>C: 200 QueryResponse (conversation_id, text)
  Note over H,S: On APIConnectionError -> 500 with error payload
Loading
sequenceDiagram
  autonumber
  participant C as Client
  participant R as /v2/streaming_query Router
  participant H as streaming_query_endpoint_handler_v2
  participant A as Auth/RBAC
  participant M as Model Resolver
  participant L as AsyncLlamaStackClient
  participant S as Responses API (stream)
  participant E as SSE Stream
  participant P as Persistence
  participant X as Metrics

  C->>R: POST /v2/streaming_query
  R->>H: Dispatch
  H->>A: Authenticate & authorize
  H->>M: Resolve provider/model
  H->>H: Build system prompt + tools
  H->>L: client.responses.stream.create(...)
  L-->>H: Async iterator of events
  H->>E: Emit start
  loop Stream events
    H->>E: token/text deltas
    H->>E: tool_call events (if any)
    opt response.created / complete
      H->>P: Persist conversation/transcript
    end
  end
  H->>X: Increment llm_calls_total (or failures)
  H->>E: Emit end-of-stream
  E-->>C: StreamingResponse
  Note over H,L: On APIConnectionError -> 500 with error payload
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Potential review focus areas:

  • Correct construction and security of MCP tool headers (Authorization propagation).
  • Accurate parsing and normalization of Response API ids in suid.check_suid.
  • Consistency between new v2 retrieve_response implementations and existing v1 behavior (tool formats, TurnSummary fields).
  • Streaming iterator/error handling and SSE event sequencing.

Possibly related PRs

Suggested reviewers

  • tisnik
  • manstis

Poem

I thump my paw—new tunnels hum,
V2 queries sprint and streaming drums.
Tools unlocked, tokens in tow,
Conversations saved as we go.
A little hop for every call—hip-hop hooray! 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.88% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Responses v2 support" accurately reflects the main objective of this pull request. The PR introduces v2 versions of query and streaming_query endpoints that leverage LlamaStack's Responses API instead of the Agent API, along with supporting infrastructure changes (routing, utilities, tests). The title is concise, clear, and avoids vague language or noise. A developer reviewing the commit history would quickly understand that this PR adds new v2 endpoint support without needing to examine the details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/app/endpoints/streaming_query.py (1)

212-212: SSE response is missing correct content type and streaming headers

Same issue as v2; events may be buffered or ignored.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )
src/app/endpoints/conversations.py (1)

88-99: Access session_id robustly; returned items may be models, not dicts

Using .get() on a Pydantic/typed object will fail.

-        session_id = str(agent_sessions[0].get("session_id"))
+        first = agent_sessions[0]
+        session_id = (
+            first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+        )
+        if not session_id:
+            raise HTTPException(
+                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                detail={"response": "Invalid session object", "cause": "session_id missing"},
+            )
+        session_id = str(session_id)
🧹 Nitpick comments (17)
src/utils/query.py (4)

309-310: Remove unused variable ‘token’.

Minor cleanup.

-    user_id, _, _, token = auth
+    user_id, _, _, _ = auth

338-350: Return 503 to match the documented OpenAPI response for connection errors.

Doc block maps connection failures to 503; handler currently uses 500.

-    raise HTTPException(
-        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+    raise HTTPException(
+        status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
         detail={
             "response": "Unable to connect to Llama Stack",
             "cause": str(e),
         },
     ) from e

33-52: Use 401 for Unauthorized in OpenAPI mapping.

400 is misleading for auth failures; map UnauthorizedResponse under 401.

-    400: {
-        "description": "Missing or invalid credentials provided by client",
-        "model": UnauthorizedResponse,
-    },
+    401: {
+        "description": "Missing or invalid credentials provided by client",
+        "model": UnauthorizedResponse,
+    },

258-288: Consider normalizing attachment metadata (lowercasing) before validation.

Prevents case-related rejects from clients.

-    for attachment in attachments:
+    for attachment in attachments:
+        # normalize without mutating original pydantic model
+        atype = attachment.attachment_type.lower()
+        ctype = attachment.content_type.lower()
-        if attachment.attachment_type not in constants.ATTACHMENT_TYPES:
+        if atype not in constants.ATTACHMENT_TYPES:
@@
-        if attachment.content_type not in constants.ATTACHMENT_CONTENT_TYPES:
+        if ctype not in constants.ATTACHMENT_CONTENT_TYPES:
src/app/endpoints/streaming_query_v2.py (2)

96-99: Pass the validated conversation context to model selection hints

Currently always passing None; prefer the actual user_conversation.

-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),

108-109: Avoid metadata_map shadowing

Outer variable is unused and shadowed by inner definition.

-        metadata_map: dict[str, dict[str, Any]] = {}
+        # metadata captured within response_generator

Also applies to: 128-129

src/app/endpoints/streaming_query.py (1)

5-6: Remove unused regex now that metadata parsing lives in utils.streaming_query

Dead code after centralization.

-import re
-...
-METADATA_PATTERN = re.compile(r"\nMetadata: (\{.+})\n")
+# metadata extraction handled by utils.streaming_query

Also applies to: 59-59

src/utils/streaming_query.py (2)

164-172: Guard error payload shape

chunk.error may not always be a dict with "message".

-            "token": chunk.error["message"],
+            "token": getattr(chunk.error, "message", None)
+            or (chunk.error.get("message") if isinstance(chunk.error, dict) else str(chunk.error)),

80-100: End-event placeholders: consider carrying token counts when available

If tokens are tracked upstream, thread them through metadata_map or function args.

Happy to wire this if you expose counts from metrics/turn payloads.

src/app/endpoints/query_v2.py (2)

100-102: Prune unused params/imports and dead code

provider_id and mcp_headers aren’t used in retrieve_response; imported token-count util also unused; stray blank lines.

-        summary, conversation_id = await retrieve_response(
+        summary, conversation_id = await retrieve_response(
             client,
             llama_stack_model_id,
             query_request,
             token,
-            mcp_headers=mcp_headers,
-            provider_id=provider_id,
+            mcp_headers=mcp_headers,
         )
-    mcp_headers: dict[str, dict[str, str]] | None = None,
-    provider_id: str = "",
+    mcp_headers: dict[str, dict[str, str]] | None = None,
-    model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id
-    #update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt)
+    # Token accounting TBD for Responses API
-from metrics.utils import update_llm_token_count_from_turn

Also collapse extra blank import lines around utils.query.

Also applies to: 140-141, 264-266, 20-21, 31-39


220-233: Guard when response.output is None

Avoid iterating over None.

-    for idx, output_item in enumerate(response.output):
+    for idx, output_item in enumerate(response.output or []):
src/app/endpoints/conversations_v2.py (1)

58-79: Docs/status coverage: add 400/403 error responses.

This handler can raise 400 (invalid ID) and 403 (access denied) via validate_conversation_id/access, but conversation_responses doesn’t document them.

I recommend extending conversation_responses in utils.conversations to include 400 and 403 entries so OpenAPI reflects actual behavior.

src/utils/conversations.py (4)

59-66: OpenAPI example mismatch with model (message vs response).

ConversationDeleteResponse uses the field response, but the 200 example here uses message. This misleads API consumers.

 conversation_delete_responses: dict[int | str, dict[str, Any]] = {
         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
         "success": True,
-        "message": "Conversation deleted successfully",
+        "response": "Conversation deleted successfully",
     },

31-57: Enhance responses spec: document 400/403 for conversation retrieval.

The retrieval flow can emit:

  • 400 for invalid ID (validate_conversation_id)
  • 403 for unauthorized access (validate_conversation_access)
 conversation_responses: dict[int | str, dict[str, Any]] = {
         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
         "chat_history": [
             {
                 "messages": [
                     {"content": "Hi", "type": "user"},
                     {"content": "Hello!", "type": "assistant"},
                 ],
                 "started_at": "2024-01-01T00:00:00Z",
                 "completed_at": "2024-01-01T00:00:05Z",
             }
         ],
     },
+    400: {
+        "detail": {
+            "response": "Invalid conversation ID format",
+            "cause": "Conversation ID is not a valid SUID/Response ID",
+        }
+    },
+    403: {
+        "detail": {
+            "response": "Access denied",
+            "cause": "You do not have permission to read this conversation",
+        }
+    },
         "detail": {
             "response": "Conversation not found",
             "cause": "The specified conversation ID does not exist.",
         }
     },

109-149: Guard against None output messages in simplified history.

Currently appends an output message even when output_message is missing, producing {"content": None, "type": None}.

-    # Extract only essential data from each turn
+    # Extract only essential data from each turn
     for turn in session_data.get("turns", []):
         # Clean up input messages
         cleaned_messages = []
         for msg in turn.get("input_messages", []):
             cleaned_msg = {
                 "content": msg.get("content"),
                 "type": msg.get("role"),  # Rename role to type
             }
             cleaned_messages.append(cleaned_msg)

         # Clean up output message
         output_msg = turn.get("output_message", {})
-        cleaned_messages.append(
-            {
-                "content": output_msg.get("content"),
-                "type": output_msg.get("role"),  # Rename role to type
-            }
-        )
+        if output_msg and (output_msg.get("content") is not None):
+            cleaned_messages.append(
+                {
+                    "content": output_msg.get("content"),
+                    "type": output_msg.get("role"),  # Rename role to type
+                }
+            )

210-221: Clarify 400 error message for non-UUID IDs.

Since check_suid accepts opaque resp-/resp_ IDs, the current cause text “not a valid UUID” can be misleading.

-                "cause": f"Conversation ID {conversation_id} is not a valid UUID",
+                "cause": f"Conversation ID {conversation_id} is not a valid SUID/Response ID",
src/app/endpoints/query.py (1)

116-120: Align upstream-connection status codes across endpoints.

handle_api_connection_error() raises 500, while conversation deletion maps APIConnectionError to 503. Pick one (prefer 503 Service Unavailable for upstream outages) and standardize.

Would you like me to open a follow-up to change handle_api_connection_error() to raise 503 and update OpenAPI docs accordingly?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b14d91c and c3d8748.

📒 Files selected for processing (13)
  • pyproject.toml (1 hunks)
  • src/app/endpoints/conversations.py (7 hunks)
  • src/app/endpoints/conversations_v2.py (1 hunks)
  • src/app/endpoints/query.py (6 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query.py (2 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (2 hunks)
  • src/constants.py (1 hunks)
  • src/utils/conversations.py (1 hunks)
  • src/utils/query.py (1 hunks)
  • src/utils/streaming_query.py (1 hunks)
  • src/utils/suid.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
src/app/endpoints/streaming_query_v2.py (8)
src/auth/__init__.py (1)
  • get_auth_dependency (14-43)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/utils/endpoints.py (3)
  • check_configuration_loaded (62-68)
  • get_system_prompt (71-102)
  • validate_model_provider_override (105-125)
src/utils/query.py (5)
  • evaluate_model_hints (94-133)
  • is_transcripts_enabled (55-61)
  • persist_user_conversation_details (64-91)
  • select_model_and_provider_id (136-213)
  • validate_attachments_metadata (258-287)
src/utils/streaming_query.py (3)
  • format_stream_data (26-37)
  • stream_start_event (40-61)
  • stream_end_event (64-100)
src/utils/transcripts.py (1)
  • store_transcript (33-86)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • ToolCallSummary (43-56)
src/app/endpoints/query_v2.py (2)
  • get_rag_tools (274-283)
  • get_mcp_tools (286-304)
src/utils/streaming_query.py (4)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • append_tool_calls_from_llama (65-78)
src/utils/endpoints.py (1)
  • get_system_prompt (71-102)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/app/endpoints/streaming_query.py (1)
  • response_generator (139-200)
src/utils/conversations.py (8)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/configuration.py (1)
  • configuration (60-64)
src/app/database.py (1)
  • get_session (34-40)
src/models/config.py (2)
  • config (130-136)
  • Action (303-342)
src/models/database/conversations.py (1)
  • UserConversation (11-36)
src/models/responses.py (4)
  • ConversationDeleteResponse (425-458)
  • ConversationsListResponse (493-550)
  • ConversationDetails (461-490)
  • UnauthorizedResponse (343-357)
src/utils/endpoints.py (3)
  • check_configuration_loaded (62-68)
  • delete_conversation (22-36)
  • validate_conversation_ownership (39-59)
src/utils/suid.py (1)
  • check_suid (15-44)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (31-46)
src/app/endpoints/streaming_query.py (3)
src/utils/endpoints.py (2)
  • validate_model_provider_override (105-125)
  • validate_conversation_ownership (39-59)
src/utils/query.py (1)
  • select_model_and_provider_id (136-213)
src/utils/streaming_query.py (4)
  • format_stream_data (26-37)
  • stream_start_event (40-61)
  • stream_end_event (64-100)
  • stream_build_event (103-146)
src/app/endpoints/conversations_v2.py (5)
src/auth/__init__.py (1)
  • get_auth_dependency (14-43)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/models/config.py (2)
  • config (130-136)
  • Action (303-342)
src/models/responses.py (3)
  • ConversationDeleteResponse (425-458)
  • ConversationResponse (375-422)
  • ConversationsListResponse (493-550)
src/utils/conversations.py (4)
  • get_conversations_list_base (151-207)
  • delete_conversation_base (262-360)
  • validate_conversation_id (210-220)
  • validate_conversation_access (223-259)
src/utils/query.py (9)
src/configuration.py (4)
  • configuration (60-64)
  • user_data_collection_configuration (81-85)
  • inference (121-125)
  • llama_stack_configuration (74-78)
src/app/database.py (1)
  • get_session (34-40)
src/models/config.py (2)
  • config (130-136)
  • Action (303-342)
src/models/database/conversations.py (1)
  • UserConversation (11-36)
src/models/requests.py (2)
  • QueryRequest (72-222)
  • Attachment (15-69)
src/models/responses.py (2)
  • UnauthorizedResponse (343-357)
  • ForbiddenResponse (360-372)
src/utils/endpoints.py (3)
  • check_configuration_loaded (62-68)
  • validate_conversation_ownership (39-59)
  • validate_model_provider_override (105-125)
src/utils/transcripts.py (1)
  • store_transcript (33-86)
src/utils/types.py (1)
  • TurnSummary (59-78)
src/app/endpoints/query_v2.py (8)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/models/requests.py (1)
  • QueryRequest (72-222)
src/models/responses.py (1)
  • QueryResponse (48-79)
src/utils/endpoints.py (3)
  • check_configuration_loaded (62-68)
  • get_system_prompt (71-102)
  • validate_model_provider_override (105-125)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/query.py (4)
  • evaluate_model_hints (94-133)
  • process_transcript_and_persist_conversation (352-383)
  • select_model_and_provider_id (136-213)
  • validate_attachments_metadata (258-287)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • ToolCallSummary (43-56)
src/app/endpoints/query.py (1)
src/utils/query.py (8)
  • evaluate_model_hints (94-133)
  • select_model_and_provider_id (136-213)
  • is_input_shield (242-255)
  • is_output_shield (229-239)
  • validate_attachments_metadata (258-287)
  • validate_query_request (290-335)
  • handle_api_connection_error (338-349)
  • process_transcript_and_persist_conversation (352-383)
src/app/endpoints/conversations.py (3)
src/utils/conversations.py (5)
  • simplify_session_data (109-148)
  • get_conversations_list_base (151-207)
  • validate_conversation_id (210-220)
  • validate_conversation_access (223-259)
  • delete_conversation_base (262-360)
src/models/config.py (1)
  • Action (303-342)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
🔇 Additional comments (7)
src/constants.py (1)

5-5: Version cap bump looks good. Ensure CI exercises 0.2.21 explicitly.

src/app/routers.py (1)

17-20: Mounting V2 under /v2 while preserving V1 looks good.

Route composition is clear; ordering avoids conflicts. Please ensure unit tests assert presence of the new /v2 routes.

Also applies to: 41-44

src/app/endpoints/conversations.py (1)

42-43: LGTM: list endpoint delegates cleanly to shared base

Good reuse and RBAC is handled upstream.

src/app/endpoints/query_v2.py (1)

274-304: LGTM: tool adapters for Responses API

Straightforward mapping for RAG and MCP tools; debug logging is helpful.

src/app/endpoints/conversations_v2.py (1)

48-56: LGTM: list conversations v2 delegates to shared base cleanly.

src/app/endpoints/query.py (2)

63-68: LGTM: clear docstring about Agent API usage.


102-109: Nice: centralizing transcript persistence.

Good separation via process_transcript_and_persist_conversation() and consistent metrics updates.

Comment on lines 32 to 43
async def get_response_sessions(_client, _conversation_id: str):
"""Placeholder for Response API session listing.
For now, we rely on the shared base which works without direct
backend calls until a dedicated Responses API endpoint is wired.
"""
return [
{"session_id": _conversation_id},
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix: session helper functions must be synchronous (they’re called without await).

delete_conversation_base() invokes get_session_func synchronously. Defining these as async returns coroutines and breaks truthiness checks and deletion flow.

Apply this diff:

-async def get_response_sessions(_client, _conversation_id: str):
+def get_response_sessions(_client, _conversation_id: str):
     """Placeholder for Response API session listing.

     For now, we rely on the shared base which works without direct
     backend calls until a dedicated Responses API endpoint is wired.
     """
     return [
         {"session_id": _conversation_id},
     ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def get_response_sessions(_client, _conversation_id: str):
"""Placeholder for Response API session listing.
For now, we rely on the shared base which works without direct
backend calls until a dedicated Responses API endpoint is wired.
"""
return [
{"session_id": _conversation_id},
]
def get_response_sessions(_client, _conversation_id: str):
"""Placeholder for Response API session listing.
For now, we rely on the shared base which works without direct
backend calls until a dedicated Responses API endpoint is wired.
"""
return [
{"session_id": _conversation_id},
]
🤖 Prompt for AI Agents
In src/app/endpoints/conversations_v2.py around lines 32 to 41, the session
helper is defined as async but is called synchronously by
delete_conversation_base; change the function to a regular synchronous function
(replace "async def" with "def"), keep the same return value (a list of dicts),
update the docstring to remove async wording, and ensure no callers await this
function so it returns a concrete list instead of a coroutine.

Comment on lines 43 to 50
async def delete_response_sessions(_client, _conversation_id: str, _sessions):
"""Placeholder for Response API session deletion."""
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Same issue: make deletion helper synchronous.

Matches the calling convention in delete_conversation_base().

-async def delete_response_sessions(_client, _conversation_id: str, _sessions):
+def delete_response_sessions(_client, _conversation_id: str, _sessions):
     """Placeholder for Response API session deletion."""
     return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def delete_response_sessions(_client, _conversation_id: str, _sessions):
"""Placeholder for Response API session deletion."""
return None
def delete_response_sessions(_client, _conversation_id: str, _sessions):
"""Placeholder for Response API session deletion."""
return None
🤖 Prompt for AI Agents
In src/app/endpoints/conversations_v2.py around lines 43 to 46, the helper
delete_response_sessions is currently async but should be synchronous to match
the calling convention in delete_conversation_base(); change the function to a
regular def, remove async/await usage, and return None (or perform synchronous
deletion logic) so callers expecting a synchronous function get a normal return
value instead of a coroutine.

Comment on lines 154 to 164
"""Delete agent sessions for a conversation."""
agent_id = conversation_id
session_id = str(sessions[0].get("session_id"))
await client.agents.session.delete(agent_id=agent_id, session_id=session_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Same issue when deleting sessions

Avoid .get on possibly typed objects.

-    session_id = str(sessions[0].get("session_id"))
+    first = sessions[0]
+    session_id = first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+    if not session_id:
+        raise RuntimeError("session_id missing on session")
+    session_id = str(session_id)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Delete agent sessions for a conversation."""
agent_id = conversation_id
session_id = str(sessions[0].get("session_id"))
await client.agents.session.delete(agent_id=agent_id, session_id=session_id)
"""Delete agent sessions for a conversation."""
agent_id = conversation_id
first = sessions[0]
session_id = first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
if not session_id:
raise RuntimeError("session_id missing on session")
session_id = str(session_id)
await client.agents.session.delete(agent_id=agent_id, session_id=session_id)
🤖 Prompt for AI Agents
In src/app/endpoints/conversations.py around lines 154 to 157, the code uses
sessions[0].get("session_id") which relies on .get on a potentially typed
object; replace this with explicit, type-safe access: first validate that
sessions is a non-empty list and that sessions[0] is the expected type, then
extract the session_id with the appropriate accessor (e.g.,
sessions[0]["session_id"] if it's a dict or sessions[0].session_id if it's an
object), handle missing/invalid cases by raising a clear error or returning
early, and pass the resulting stringified session_id to
client.agents.session.delete.

Comment on lines 55 to 62
def is_transcripts_enabled() -> bool:
"""Check if transcripts is enabled.
Returns:
bool: True if transcripts is enabled, False otherwise.
"""
return configuration.user_data_collection_configuration.transcripts_enabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Call configuration accessors; current code treats methods as attributes and will fail at runtime.

These are methods (see configuration.py). This will raise AttributeError.

-    return configuration.user_data_collection_configuration.transcripts_enabled
+    return configuration.user_data_collection_configuration().transcripts_enabled
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def is_transcripts_enabled() -> bool:
"""Check if transcripts is enabled.
Returns:
bool: True if transcripts is enabled, False otherwise.
"""
return configuration.user_data_collection_configuration.transcripts_enabled
def is_transcripts_enabled() -> bool:
"""Check if transcripts is enabled.
Returns:
bool: True if transcripts is enabled, False otherwise.
"""
return configuration.user_data_collection_configuration().transcripts_enabled
🤖 Prompt for AI Agents
In src/utils/query.py around lines 55 to 62, the code treats
configuration.user_data_collection_configuration.transcripts_enabled as an
attribute but it's a method; change the return to call the method (e.g., return
configuration.user_data_collection_configuration.transcripts_enabled()) so it
invokes the accessor and returns the boolean; ensure the function signature/type
hints remain bool and consider adding a defensive try/except or logging if
configuration could be None or the call may raise.

Comment on lines 136 to 214
def select_model_and_provider_id(
models: ModelListResponse, model_id: str | None, provider_id: str | None
) -> tuple[str, str, str]:
"""
Select the model ID and provider ID based on the request or available models.
Determine and return the appropriate model and provider IDs for
a query request.
If the request specifies both model and provider IDs, those are used.
Otherwise, defaults from configuration are applied. If neither is
available, selects the first available LLM model from the provided model
list. Validates that the selected model exists among the available models.
Returns:
A tuple containing the combined model ID (in the format
"provider/model") and the provider ID.
Raises:
HTTPException: If no suitable LLM model is found or the selected model is not available.
"""
# If model_id and provider_id are provided in the request, use them

# If model_id is not provided in the request, check the configuration
if not model_id or not provider_id:
logger.debug(
"No model ID or provider ID specified in request, checking configuration"
)
model_id = configuration.inference.default_model # type: ignore[reportAttributeAccessIssue]
provider_id = (
configuration.inference.default_provider # type: ignore[reportAttributeAccessIssue]
)

# If no model is specified in the request or configuration, use the first available LLM
if not model_id or not provider_id:
logger.debug(
"No model ID or provider ID specified in request or configuration, "
"using the first available LLM"
)
try:
model = next(
m
for m in models
if m.model_type == "llm" # pyright: ignore[reportAttributeAccessIssue]
)
model_id = model.identifier
provider_id = model.provider_id
logger.info("Selected model: %s", model)
return model_id, model_id, provider_id
except (StopIteration, AttributeError) as e:
message = "No LLM model found in available models"
logger.error(message)
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail={
"response": constants.UNABLE_TO_PROCESS_RESPONSE,
"cause": message,
},
) from e

llama_stack_model_id = f"{provider_id}/{model_id}"
# Validate that the model_id and provider_id are in the available models
logger.debug("Searching for model: %s, provider: %s", model_id, provider_id)
if not any(
m.identifier == llama_stack_model_id and m.provider_id == provider_id
for m in models
):
message = f"Model {model_id} from provider {provider_id} not found in available models"
logger.error(message)
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail={
"response": constants.UNABLE_TO_PROCESS_RESPONSE,
"cause": message,
},
)

return llama_stack_model_id, model_id, provider_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix model/provider resolution: wrong early return tuple and improper config accessor usage.

  • Early return sets both first and second elements to the combined identifier.
  • Config accessors should be called.
-    if not model_id or not provider_id:
+    if not model_id or not provider_id:
         logger.debug(
             "No model ID or provider ID specified in request, checking configuration"
         )
-        model_id = configuration.inference.default_model  # type: ignore[reportAttributeAccessIssue]
-        provider_id = (
-            configuration.inference.default_provider  # type: ignore[reportAttributeAccessIssue]
-        )
+        model_id = configuration.inference().default_model
+        provider_id = configuration.inference().default_provider
@@
-        try:
+        try:
             model = next(
                 m
                 for m in models
                 if m.model_type == "llm"  # pyright: ignore[reportAttributeAccessIssue]
             )
-            model_id = model.identifier
-            provider_id = model.provider_id
-            logger.info("Selected model: %s", model)
-            return model_id, model_id, provider_id
+            llama_stack_model_id = model.identifier
+            provider_id = model.provider_id
+            # Derive plain model id when identifier is "provider/model"
+            model_id = llama_stack_model_id.split("/", 1)[1] if "/" in llama_stack_model_id else llama_stack_model_id
+            logger.info("Selected model: %s", llama_stack_model_id)
+            return llama_stack_model_id, model_id, provider_id

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/utils/query.py around lines 136–214, the function improperly overwrites
missing model/provider by unconditionally assigning config attributes without
calling them, and returns the wrong tuple when selecting the first LLM (it
returns model_id twice). Fix by calling the config accessors (e.g.
configuration.inference.default_model() and
configuration.inference.default_provider()) and only assign each of model_id or
provider_id when that specific value is missing, then when selecting the first
available LLM construct the combined id (provider/model) and return
(combined_id, model.identifier, model.provider_id) so the returned tuple is
(llama_stack_model_id, model_id, provider_id).

Comment on lines 307 to 308
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid logging the full Llama Stack config; may include sensitive data. Also call the accessor.

Log only safe fields (e.g., version/provider list), not the entire object.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info("Llama Stack version: %s; providers: %s", getattr(cfg, "version", "n/a"), getattr(cfg, "providers", "n/a"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
cfg = configuration.llama_stack_configuration()
logger.info("Llama Stack version: %s; providers: %s", getattr(cfg, "version", "n/a"), getattr(cfg, "providers", "n/a"))
🤖 Prompt for AI Agents
In src/utils/query.py around lines 307-308, the code currently logs the entire
Llama stack configuration object which may contain sensitive data; instead call
the configuration accessor to extract and log only safe fields (for example:
configuration.llama_stack_configuration.get_version() and
configuration.llama_stack_configuration.get_providers() or equivalent accessors)
and format a concise message like "Llama stack config: version=%s providers=%s"
using those values; remove or replace the existing logger.info invocation so
full object contents are not written to logs.

Comment on lines 16 to 37
"""Check if given string is a proper session ID or response ID.
Args:
suid: The string to check.
Returns True if the string is a valid UUID, False otherwise.
Rules:
- If it starts with "resp-" or "resp_", accept as a valid Responses API ID (opaque).
- Otherwise, require a valid UUID string.
"""
if not isinstance(suid, str) or not suid:
return False

# Handle Responses API IDs
if suid.startswith("resp-") or suid.startswith("resp_"):
token = suid[5:]
if not token:
return False
# If truncated (e.g., shell cut reduced length), pad to canonical UUID length
if len(token) < 36:
token = token + ("0" * (36 - len(token)))
try:
uuid.UUID(token)
return True
except (ValueError, TypeError):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Don’t coerce opaque resp- IDs into UUIDs; the zero‑padding path is incorrect and unsafe.*

Response API IDs are opaque; padding/truncation can accept malformed IDs and reject valid ones.

 def check_suid(suid: str) -> bool:
-    """Check if given string is a proper session ID or response ID.
+    """Check if given string is a proper session ID or response ID.
@@
-    if not isinstance(suid, str) or not suid:
+    if not isinstance(suid, str) or not suid:
         return False
-
-    # Handle Responses API IDs
-    if suid.startswith("resp-") or suid.startswith("resp_"):
-        token = suid[5:]
-        if not token:
-            return False
-        # If truncated (e.g., shell cut reduced length), pad to canonical UUID length
-        if len(token) < 36:
-            token = token + ("0" * (36 - len(token)))
-        try:
-            uuid.UUID(token)
-            return True
-        except (ValueError, TypeError):
-            return False
+    # Handle Responses API IDs – treat as opaque, minimally validated
+    if suid.startswith(("resp-", "resp_")):
+        token = suid[5:]
+        # Require a non-empty token with sane charset
+        return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token)
@@
     try:
         uuid.UUID(suid)
         return True
     except (ValueError, TypeError):
         return False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""Check if given string is a proper session ID or response ID.
Args:
suid: The string to check.
Returns True if the string is a valid UUID, False otherwise.
Rules:
- If it starts with "resp-" or "resp_", accept as a valid Responses API ID (opaque).
- Otherwise, require a valid UUID string.
"""
if not isinstance(suid, str) or not suid:
return False
# Handle Responses API IDs
if suid.startswith("resp-") or suid.startswith("resp_"):
token = suid[5:]
if not token:
return False
# If truncated (e.g., shell cut reduced length), pad to canonical UUID length
if len(token) < 36:
token = token + ("0" * (36 - len(token)))
try:
uuid.UUID(token)
return True
except (ValueError, TypeError):
return False
def check_suid(suid: str) -> bool:
"""Check if given string is a proper session ID or response ID.
Rules:
- If it starts with "resp-" or "resp_", accept as a valid Responses API ID (opaque).
- Otherwise, require a valid UUID string.
"""
if not isinstance(suid, str) or not suid:
return False
# Handle Responses API IDs – treat as opaque, minimally validated
if suid.startswith(("resp-", "resp_")):
token = suid[5:]
# Require a non-empty token with sane charset
return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token)
try:
uuid.UUID(suid)
return True
except (ValueError, TypeError):
return False
🤖 Prompt for AI Agents
In src/utils/suid.py around lines 16 to 37, the code currently treats resp- or
resp_ IDs as if they were UUIDs by zero-padding truncated tokens and attempting
uuid.UUID(token); remove that behavior and treat Responses API IDs as opaque: if
suid is a non-empty string and starts with "resp-" or "resp_", simply return
True when the token after the prefix is non-empty (do not pad or call
uuid.UUID); for all other values keep the existing UUID validation path (i.e.,
require a valid UUID string). Ensure type/empty checks remain and do not
introduce any coercion or padding logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/app/endpoints/query.py (3)

200-217: Security: avoid sending Bearer tokens to non‑HTTPS MCP servers by default.

The code forwards Authorization: Bearer <token> to every configured MCP server when headers are empty. This risks credential exposure if any server uses http:// or is misconfigured.

Apply:

+from urllib.parse import urlparse
@@
-        if not mcp_headers and token:
-            for mcp_server in configuration.mcp_servers:
-                mcp_headers[mcp_server.url] = {
-                    "Authorization": f"Bearer {token}",
-                }
+        if not mcp_headers and token:
+            for mcp_server in configuration.mcp_servers:
+                url = mcp_server.url
+                scheme = urlparse(url).scheme.lower()
+                if scheme != "https":
+                    logger.warning("Skipping non-HTTPS MCP server for auth header: %s", url)
+                    continue
+                mcp_headers[url] = {"Authorization": f"Bearer {token}"}

Optionally gate this behind a config flag (default off) and/or send tokens only to whitelisted servers that explicitly require auth.


178-182: Attachments privacy TODO is a compliance risk; prioritize redaction.

There’s a TODO to redact attachments before sending to the LLM. If attachments can include PII or secrets, add at least basic redaction or a feature flag to block unredacted uploads.

I can draft a pluggable redactor interface and a default regex‑based redactor for common PII (emails, phones, SSNs), gated by config.


54-56: Fix query_response to a valid FastAPI/OpenAPI responses mapping

src/utils/query.py defines query_response but several entries aren’t in FastAPI’s expected shape:

  • 200 is a raw example body (conversation_id/response) instead of a response mapping — use "model": QueryResponse or "content": {"application/json": {"example": {...}}}.
  • 503 uses a "detail" key with nested payload; replace with "description" and/or "content" (or a "model") to match FastAPI.
  • 400/403 already include "model" (OK).

Location: src/utils/query.py — query_response definition.

♻️ Duplicate comments (6)
src/app/endpoints/conversations.py (3)

100-100: Typed vs dict session object: .get will fail at runtime.
client.agents.session.list(...) likely returns typed models, not dicts. Accessing via .get risks AttributeError. Use type-safe access with a fallback and validate presence.

Apply this diff:

-        session_id = str(agent_sessions[0].get("session_id"))
+        first = agent_sessions[0]
+        session_id = (
+            first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+        )
+        if not session_id:
+            raise RuntimeError("session_id missing on session")
+        session_id = str(session_id)

155-159: Same typed vs dict issue on delete; also consider multiple sessions.
Using .get may break; fix similarly. Optionally, delete all sessions for the conversation instead of just the first.

Apply this minimal fix:

-    session_id = str(sessions[0].get("session_id"))
-    await client.agents.session.delete(agent_id=agent_id, session_id=session_id)
+    first = sessions[0]
+    session_id = (
+        first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+    )
+    if not session_id:
+        raise RuntimeError("session_id missing on session")
+    await client.agents.session.delete(agent_id=agent_id, session_id=str(session_id))

Optional improvement (delete all sessions):

-    first = sessions[0]
-    session_id = (
-        first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
-    )
-    if not session_id:
-        raise RuntimeError("session_id missing on session")
-    await client.agents.session.delete(agent_id=agent_id, session_id=str(session_id))
+    for s in sessions:
+        sid = s.get("session_id") if isinstance(s, dict) else getattr(s, "session_id", None)
+        if not sid:
+            continue
+        await client.agents.session.delete(agent_id=agent_id, session_id=str(sid))

182-188: Awaiting a sync base + passing async helpers will crash.
delete_conversation_base is synchronous in utils.conversations, yet you await it and pass async helpers; this will raise at runtime and the async helpers won’t be awaited inside the base.

Preferred fix: make the base async and await the helpers; keep this call site as await.

Patch (in src/utils/conversations.py):

-def delete_conversation_base(
+async def delete_conversation_base(
     request: Request,
     conversation_id: str,
     auth: Any,
-    get_session_func: callable,
-    delete_session_func: callable,
+    get_session_func: callable,
+    delete_session_func: callable,
 ) -> ConversationDeleteResponse:
@@
-        sessions = get_session_func(client, conversation_id)
+        sessions = await get_session_func(client, conversation_id)
@@
-        delete_session_func(client, conversation_id, sessions)
+        await delete_session_func(client, conversation_id, sessions)

If you must keep the base sync, then do not await it here and instead provide sync wrappers that offload the async client calls, or run the entire base in a threadpool. Async base is cleaner.

Run this to confirm current signature and callers:

#!/bin/bash
rg -nC2 -P '\bdef\s+delete_conversation_base\s*\(' src | sed -n '1,120p'
rg -nP 'await\s+delete_conversation_base\(' src | sed -n '1,200p'
pyproject.toml (1)

31-32: Add upper bounds for llama-stack(+client) to match MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION (0.2.21).

Prevents unsupported upgrades that will be rejected at runtime by the version gate.

-    "llama-stack>=0.2.19",
-    "llama-stack-client>=0.2.19",
+    "llama-stack>=0.2.19,<=0.2.21",
+    "llama-stack-client>=0.2.19,<=0.2.21",
src/utils/suid.py (1)

16-37: Treat resp- IDs as opaque; remove UUID padding/coercion.*

Zero‑padding and UUID parsing can accept malformed IDs and reject valid ones.

 def check_suid(suid: str) -> bool:
@@
-    # Handle Responses API IDs
-    if suid.startswith("resp-") or suid.startswith("resp_"):
-        token = suid[5:]
-        if not token:
-            return False
-        # If truncated (e.g., shell cut reduced length), pad to canonical UUID length
-        if len(token) < 36:
-            token = token + ("0" * (36 - len(token)))
-        try:
-            uuid.UUID(token)
-            return True
-        except (ValueError, TypeError):
-            return False
+    # Handle Responses API IDs – treat as opaque
+    if suid.startswith(("resp-", "resp_")):
+        token = suid[5:]
+        return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token)
src/app/endpoints/query_v2.py (1)

25-29: Validate conversation ownership before chaining via previous_response_id.

Prevents cross‑user context leakage; mirrors v1 parity.

 from utils.endpoints import (
     check_configuration_loaded,
     get_system_prompt,
     validate_model_provider_override,
+    validate_conversation_ownership,
 )
@@
-    user_conversation: UserConversation | None = None
-    if query_request.conversation_id:
-        # TODO: Implement conversation once Llama Stack supports its API
-        pass
+    user_conversation: UserConversation | None = None
+    if query_request.conversation_id:
+        user_conversation = validate_conversation_ownership(
+            user_id=user_id, conversation_id=query_request.conversation_id
+        )
+        if user_conversation is None:
+            raise HTTPException(
+                status_code=status.HTTP_403_FORBIDDEN,
+                detail={
+                    "response": "Access denied",
+                    "cause": "You do not have permission to access this conversation",
+                },
+            )

Also applies to: 79-83

🧹 Nitpick comments (11)
src/utils/streaming_query.py (2)

80-100: Use resilient keys for doc URL in end event.

Metadata may use "doc_url" or "docs_url". Accept both to avoid silent drops.

-                "referenced_documents": [
-                    {
-                        "doc_url": v["docs_url"],
-                        "doc_title": v["title"],
-                    }
-                    for v in filter(
-                        lambda v: ("docs_url" in v) and ("title" in v),
-                        metadata_map.values(),
-                    )
-                ],
+                "referenced_documents": [
+                    {
+                        "doc_url": v.get("docs_url") or v.get("doc_url"),
+                        "doc_title": v["title"],
+                    }
+                    for v in filter(
+                        lambda v: (("docs_url" in v) or ("doc_url" in v)) and ("title" in v),
+                        metadata_map.values(),
+                    )
+                ],

304-329: Handle dict-shaped tool_call deltas.

Some providers emit tool_call deltas as dicts; add a safe branch.

-        if chunk.event.payload.delta.type == "tool_call":
-            if isinstance(chunk.event.payload.delta.tool_call, str):
+        if chunk.event.payload.delta.type == "tool_call":
+            if isinstance(chunk.event.payload.delta.tool_call, str):
                 ...
-            elif isinstance(chunk.event.payload.delta.tool_call, ToolCall):
+            elif isinstance(chunk.event.payload.delta.tool_call, ToolCall):
                 ...
+            elif isinstance(chunk.event.payload.delta.tool_call, dict):
+                name = (
+                    chunk.event.payload.delta.tool_call.get("tool_name")
+                    or chunk.event.payload.delta.tool_call.get("function", {}).get("name")
+                    or "tool_call"
+                )
+                yield format_stream_data(
+                    {"event": "tool_call",
+                     "data": {"id": chunk_id, "role": chunk.event.payload.step_type, "token": name}}
+                )
src/app/endpoints/query_v2.py (4)

197-206: Guard previous_response_id to only pass resp- IDs (avoid mixing v1 UUIDs).*

Prevents API errors when a v1 conversation UUID is supplied.

-    response = await client.responses.create(
+    prev_resp_id = (
+        query_request.conversation_id
+        if query_request.conversation_id and query_request.conversation_id.startswith(("resp-", "resp_"))
+        else None
+    )
+    response = await client.responses.create(
         input=query_request.query,
         model=model_id,
         instructions=system_prompt,
-        previous_response_id=query_request.conversation_id,
+        previous_response_id=prev_resp_id,
         tools=tools if tools else None,
         stream=False,
         store=True,
     )

233-235: Lower response-content logging to DEBUG to reduce sensitive data in logs.

-                logger.info("Model response content: '%s'",
+                logger.debug("Model response content: '%s'",
                            llm_response[:200] + "..." if len(llm_response) > 200 else llm_response)

190-195: Propagate per-server MCP headers into tool definitions.

The dependency already provides headers; merge them with Authorization.

-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers, token, mcp_headers)
-def get_mcp_tools(mcp_servers: list, token: str | None = None) -> list[dict]:
+def get_mcp_tools(
+    mcp_servers: list,
+    token: str | None = None,
+    mcp_headers: dict[str, dict[str, str]] | None = None,
+) -> list[dict]:
@@
-        # Add authentication if token provided (Response API format)
-        if token:
-            tool_def["headers"] = {
-                "Authorization": f"Bearer {token}"
-            }
+        # Compose headers: Authorization + per-server forwarded headers
+        headers: dict[str, str] = {}
+        if token:
+            headers["Authorization"] = f"Bearer {token}"
+        if mcp_headers and mcp_server.name in mcp_headers:
+            headers.update(mcp_headers[mcp_server.name])
+        if headers:
+            tool_def["headers"] = headers

Also applies to: 284-303


6-11: Avoid runtime dependency on server-side types for casting.

Use TYPE_CHECKING to keep imports type-only and prevent version coupling with llama_stack server package at runtime.

-from llama_stack.apis.agents.openai_responses import (
-    OpenAIResponseObject,
-)
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:  # type-only to avoid runtime coupling
+    from llama_stack_client.types.responses import OpenAIResponseObject  # pragma: no cover
@@
-    response = cast(OpenAIResponseObject, response)
+    response = cast("OpenAIResponseObject", response)  # type: ignore[name-defined]

Also applies to: 207-208

src/app/endpoints/query.py (5)

85-90: Cache model listing to cut per‑request latency.

await client.models.list() on every call can be expensive. Add a short TTL cache (e.g., 30–60s) in the client holder or utils to avoid repeated network lists in hot paths.


91-99: Name clarity: pass stack model id as such.

You pass llama_stack_model_id into retrieve_response(model_id=...). Consider renaming the parameter in retrieve_response to stack_model_id (and the callsite) to avoid confusion with model_id used for metrics/transcripts.


158-165: Avoid double network call to list shields.

await client.shields.list() is invoked twice. Fetch once and derive both input/output lists.

Apply:

-    available_input_shields = [
-        shield.identifier
-        for shield in filter(is_input_shield, await client.shields.list())
-    ]
-    available_output_shields = [
-        shield.identifier
-        for shield in filter(is_output_shield, await client.shields.list())
-    ]
+    shields = await client.shields.list()
+    available_input_shields = [s.identifier for s in filter(is_input_shield, shields)]
+    available_output_shields = [s.identifier for s in filter(is_output_shield, shields)]

175-176: Don’t log full system prompts.

System prompts may include sensitive policy text or secrets. Log a hash or length, or truncate to a small prefix.

Apply:

-    logger.debug("Using system prompt: %s", system_prompt)
+    logger.debug("Using system prompt (len=%d, sha256=%s)", len(system_prompt), __import__("hashlib").sha256(system_prompt.encode()).hexdigest()[:12])

218-224: Confirm typing; avoid per-request vector DB listing.

  • Typing: Toolgroup = Union[str, ToolgroupAgentToolGroupWithArgs]; agent.create_turn accepts List[Toolgroup], so mixing plain toolgroup id strings and structured dicts is valid — no change required.
  • Performance: calling await client.vector_dbs.list() on every request (src/app/endpoints/query.py: lines 218–224) can add latency. Cache the vector DB IDs with a short TTL or fetch them only when RAG is enabled/needed.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3d8748 and 692ede7.

📒 Files selected for processing (13)
  • pyproject.toml (1 hunks)
  • src/app/endpoints/conversations.py (7 hunks)
  • src/app/endpoints/conversations_v2.py (1 hunks)
  • src/app/endpoints/query.py (6 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query.py (2 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (2 hunks)
  • src/constants.py (1 hunks)
  • src/utils/conversations.py (1 hunks)
  • src/utils/query.py (1 hunks)
  • src/utils/streaming_query.py (1 hunks)
  • src/utils/suid.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/app/routers.py
  • src/utils/conversations.py
  • src/utils/query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations_v2.py
🧰 Additional context used
🧬 Code graph analysis (4)
src/utils/streaming_query.py (4)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • append_tool_calls_from_llama (65-78)
src/utils/endpoints.py (1)
  • get_system_prompt (71-111)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/app/endpoints/streaming_query.py (1)
  • response_generator (140-201)
src/app/endpoints/query_v2.py (6)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/configuration.py (3)
  • configuration (60-64)
  • llama_stack_configuration (74-78)
  • mcp_servers (88-92)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/utils/endpoints.py (3)
  • check_configuration_loaded (62-68)
  • get_system_prompt (71-111)
  • validate_model_provider_override (114-134)
src/utils/query.py (4)
  • evaluate_model_hints (94-133)
  • process_transcript_and_persist_conversation (352-383)
  • select_model_and_provider_id (136-213)
  • validate_attachments_metadata (258-287)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • ToolCallSummary (43-56)
src/app/endpoints/query.py (6)
src/models/requests.py (1)
  • QueryRequest (72-222)
src/models/responses.py (1)
  • QueryResponse (48-79)
src/utils/endpoints.py (2)
  • get_agent (138-187)
  • get_system_prompt (71-111)
src/utils/mcp_headers.py (2)
  • mcp_headers_dependency (15-26)
  • handle_mcp_headers_with_toolgroups (56-90)
src/utils/types.py (1)
  • TurnSummary (59-78)
src/utils/query.py (8)
  • evaluate_model_hints (94-133)
  • select_model_and_provider_id (136-213)
  • is_input_shield (242-255)
  • is_output_shield (229-239)
  • validate_attachments_metadata (258-287)
  • validate_query_request (290-335)
  • handle_api_connection_error (338-349)
  • process_transcript_and_persist_conversation (352-383)
src/app/endpoints/conversations.py (3)
src/utils/conversations.py (5)
  • simplify_session_data (109-148)
  • get_conversations_list_base (151-207)
  • validate_conversation_id (210-220)
  • validate_conversation_access (223-259)
  • delete_conversation_base (262-360)
src/models/config.py (1)
  • Action (305-344)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
🔇 Additional comments (12)
src/app/endpoints/conversations.py (9)

1-1: Docstring update aligns with Agent API usage.
Clear and accurate.


21-30: Good centralization on utils.conversations.
Imports look correct and promote reuse.


55-55: Docstring clarifications look good.
Accurately describes Agent API flow and error semantics.

Also applies to: 58-58


72-72: Early ID validation is correct.
Good to fail fast on malformed IDs.


76-81: Access check is consistent with READ_OTHERS policy.
Looks correct given Action.READ_OTHERS_CONVERSATIONS.


84-84: Logging is fine.
Message is informative and scoped.


89-89: (no functional change)


149-153: Helper is correct and concise.
Returns the raw .data which aligns with usage.


172-177: Deletion docstring is accurate.
Matches the behavior and mapped error codes.

src/constants.py (1)

5-5: Version cap aligns with the proposed dependency bounds.

LGTM. Once pyproject upper bounds are added, this gate will be consistent.

src/app/endpoints/query.py (2)

100-101: No change needed — label order is correct.

llm_calls_total is declared with labels ["provider","model"] in src/metrics/init.py and call sites pass (provider_id, model_id) in the endpoints — no changes required.


77-81: Avoid positional unpacking of AuthTuple; use named access or a helper.

Replace "_, _, _, token = auth" with a named attribute (e.g., auth.token) or a small helper that safely extracts the token — positional unpacking is brittle if AuthTuple changes. Location: src/app/endpoints/query.py (lines 77–81). Verification attempt failed: running rg for "AuthTuple" returned "authentication: No such file or directory (os error 2)"; AuthTuple definition not found in repo — confirm its shape and update the unpacking accordingly.

"cause": f"Unknown error while getting conversations for user {user_id}",
},
) from e
return get_conversations_list_base(request, auth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid blocking the event loop when calling a sync base.
get_conversations_list_base is synchronous and does DB I/O; calling it directly from an async endpoint will block the event loop. Run it in a threadpool.

Apply this diff:

@@
-from fastapi import APIRouter, HTTPException, Request, status, Depends
+from fastapi import APIRouter, HTTPException, Request, status, Depends
+from starlette.concurrency import run_in_threadpool
@@
-    return get_conversations_list_base(request, auth)
+    return await run_in_threadpool(get_conversations_list_base, request, auth)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return get_conversations_list_base(request, auth)
from fastapi import APIRouter, HTTPException, Request, status, Depends
from starlette.concurrency import run_in_threadpool
return await run_in_threadpool(get_conversations_list_base, request, auth)
🤖 Prompt for AI Agents
In src/app/endpoints/conversations.py around line 44, the endpoint currently
calls the synchronous get_conversations_list_base directly which blocks the
event loop; change the call to run the sync DB I/O in a threadpool (use
asyncio.to_thread or loop.run_in_executor) and await its result instead. Import
asyncio if needed, replace the direct return with an awaited
asyncio.to_thread(get_conversations_list_base, request, auth) (or equivalent
run_in_executor call) so the blocking work executes off the event loop and the
endpoint remains async.

Comment on lines +75 to +79
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid logging full Llama Stack configuration at INFO (may expose secrets/URLs).

Log only mode or move to DEBUG without printing the object.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    logger.debug(
+        "Llama Stack configured (mode=%s)",
+        "library" if configuration.llama_stack_configuration.use_as_library_client else "service",
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
logger.debug(
"Llama Stack configured (mode=%s)",
"library" if configuration.llama_stack_configuration.use_as_library_client else "service",
)
🤖 Prompt for AI Agents
In src/app/endpoints/query_v2.py around lines 75-76, the current INFO log prints
the entire Llama stack configuration which may expose secrets/URLs; change this
to either logger.debug(...) instead of logger.info(...) or log only
non-sensitive fields (e.g., configuration.llama_stack_configuration.mode or an
explicit safe subset), and ensure any sensitive fields are omitted or masked
before logging.

Comment on lines 123 to 129
if hasattr(chunk, "error"):
yield from _handle_error_event(chunk, chunk_id)

event_type = chunk.event.payload.event_type
step_type = getattr(chunk.event.payload, "step_type", None)

match (event_type, step_type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Error chunks should short‑circuit to avoid AttributeError.

After yielding the error event, the function continues and accesses chunk.event, which may not exist on error chunks.

-    if hasattr(chunk, "error"):
-        yield from _handle_error_event(chunk, chunk_id)
+    if hasattr(chunk, "error"):
+        yield from _handle_error_event(chunk, chunk_id)
+        return

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/utils/streaming_query.py around lines 123 to 129, after yielding the
error event the function continues and then accesses chunk.event which may not
exist on error chunks; modify the control flow so that once hasattr(chunk,
"error") yields the error (yield from _handle_error_event(...)) the function
short‑circuits (e.g., return or continue the loop) to avoid further access to
chunk.event, ensuring error chunks do not execute the subsequent
event_type/step_type logic.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/unit/app/endpoints/test_query.py (1)

113-124: Patch mock target to utils.query.configuration (not app.endpoints.query.configuration)

Replace all occurrences in tests/unit/app/endpoints/test_query.py (lines: 121, 129, 151, 446, 478, 511, 550, 600, 653, 708, 757, 807, 876, 946, 1008, 1105, 1176, 1232, 1283, 1342, 1393, 1559): change
mocker.patch("app.endpoints.query.configuration", ...) -> mocker.patch("utils.query.configuration", ...)

src/app/endpoints/query.py (1)

246-248: Token metrics label mismatch — normalize model label (no provider prefix) across metrics.

Observed mismatch: query.py derives model_label by splitting "provider/model" and passes (model_label, provider_id) to update_llm_token_count_from_turn, but src/metrics/utils.py uses model.identifier directly when registering provider/model metrics, and src/utils/query.py sometimes returns inconsistent values from select_model_and_provider_id.

  • Problem locations: src/app/endpoints/query.py (lines ~244–248), src/metrics/utils.py (update_llm_token_count_from_turn & setup_model_metrics), src/utils/query.py (select_model_and_provider_id).
  • Impact: metrics.provider_model_configuration and runtime token metrics may use different label keys (e.g., ("provider","provider/model") vs ("provider","model")), causing scattered/incorrect metric counts.
  • Fixes:
    • In src/metrics/utils.py, normalize model_name before labeling: model_name = model.identifier.split("/", 1)[1] if "/" in model.identifier else model.identifier.
    • In src/utils/query.py, ensure select_model_and_provider_id always returns a consistent tuple: (llama_stack_model_id as "provider/model", plain_model_name, provider_id) instead of returning model_id twice in the first-LLM branch.
    • Alternatively, add a single helper (e.g., utils.query.get_model_label(llama_stack_model_id)) and use it everywhere to compute the metric label.
♻️ Duplicate comments (12)
src/app/endpoints/conversations.py (4)

37-45: Don’t block the event loop: run the sync base in a threadpool

get_conversations_list_base is synchronous and does DB I/O. Calling it directly from an async route will block.

Apply:

-    return get_conversations_list_base(request, auth)
+    return await run_in_threadpool(get_conversations_list_base, request, auth)

Add import (outside this hunk):

from starlette.concurrency import run_in_threadpool

92-105: Type-safe session_id extraction (avoid .get on typed SDK models)

SDK objects are typed; .get may raise at runtime. Validate and handle missing IDs explicitly.

-        session_id = str(agent_sessions[0].get("session_id"))
+        first = agent_sessions[0]
+        session_id = (
+            first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+        )
+        if not session_id:
+            raise HTTPException(
+                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                detail={"response": "Invalid session", "cause": "session_id missing"},
+            )
+        session_id = str(session_id)
#!/bin/bash
# Find any remaining fragile `.get("session_id")` usages across the repo.
rg -nP -C2 '\.get\(\s*[\"\']session_id[\"\']\s*\)' || true

187-193: LGTM: async base awaited correctly

Awaiting delete_conversation_base with async helpers resolves the earlier await-mismatch.


158-165: Same typed-access bug in delete helper

Avoid .get; validate session_id before delete.

-    session_id = str(sessions[0].get("session_id"))
-    await client.agents.session.delete(agent_id=agent_id, session_id=session_id)
+    first = sessions[0]
+    session_id = (
+        first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+    )
+    if not session_id:
+        raise RuntimeError("session_id missing on session")
+    await client.agents.session.delete(agent_id=agent_id, session_id=str(session_id))
src/utils/query.py (4)

55-62: BUG: configuration accessors must be called (will raise at runtime).

Use configuration.user_data_collection_configuration() accessor before reading transcripts_enabled.

Apply:

-    return configuration.user_data_collection_configuration.transcripts_enabled
+    return configuration.user_data_collection_configuration().transcripts_enabled

158-169: BUG: Don’t overwrite provided model/provider; call config accessors.

Current branch assigns both defaults when either is missing and uses attributes instead of accessors.

Apply:

-    if not model_id or not provider_id:
+    if not model_id or not provider_id:
         logger.debug(
             "No model ID or provider ID specified in request, checking configuration"
         )
-        model_id = configuration.inference.default_model  # type: ignore[reportAttributeAccessIssue]
-        provider_id = (
-            configuration.inference.default_provider  # type: ignore[reportAttributeAccessIssue]
-        )
+        if not model_id:
+            model_id = configuration.inference().default_model
+        if not provider_id:
+            provider_id = configuration.inference().default_provider

175-185: BUG: Wrong tuple and identifiers when auto‑selecting first LLM.

Return value duplicates model_id and doesn’t derive plain model id from provider/model identifier.

Apply:

-            model = next(
+            model = next(
                 m
                 for m in models
                 if m.model_type == "llm"  # pyright: ignore[reportAttributeAccessIssue]
             )
-            model_id = model.identifier
-            provider_id = model.provider_id
-            logger.info("Selected model: %s", model)
-            return model_id, model_id, provider_id
+            llama_stack_model_id = model.identifier
+            provider_id = model.provider_id
+            model_id = (
+                llama_stack_model_id.split("/", 1)[1]
+                if "/" in llama_stack_model_id
+                else llama_stack_model_id
+            )
+            logger.info("Selected model: %s", llama_stack_model_id)
+            return llama_stack_model_id, model_id, provider_id

307-308: Don’t log entire Llama Stack config; call accessor and log safe fields.

Full config may include secrets; also needs accessor call.

Apply:

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )
src/app/endpoints/streaming_query_v2.py (1)

308-308: SSE headers missing (clients may buffer or misinterpret stream).

Return StreamingResponse with proper media type and no‑buffering headers.

Apply:

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )
src/utils/streaming_query.py (1)

123-129: Short-circuit on error chunks to avoid AttributeError.

After yielding the error event, the function still accesses chunk.event.* which may not exist on error chunks.

     if hasattr(chunk, "error"):
-        yield from _handle_error_event(chunk, chunk_id)
+        yield from _handle_error_event(chunk, chunk_id)
+        return
src/app/endpoints/query_v2.py (2)

66-69: Avoid logging full Llama Stack configuration at INFO in validate_query_request.

validate_query_request currently logs the entire configuration object at INFO, which risks leaking secrets/URLs. Downgrade to DEBUG and log only safe fields.

Proposed change in src/utils/query.py:

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    logger.debug(
+        "Llama Stack configured (mode=%s)",
+        "library"
+        if configuration.llama_stack_configuration.use_as_library_client
+        else "service",
+    )

226-235: Don’t log user/model content at INFO.

Even truncated, response text can contain sensitive data. Move to DEBUG.

-                logger.info(
+                logger.debug(
                     "Model response content: '%s'",
                     (
                         llm_response[:200] + "..."
                         if len(llm_response) > 200
                         else llm_response
                     ),
                 )
🧹 Nitpick comments (18)
src/app/endpoints/conversations.py (3)

92-101: 404 isn’t an error condition; lower log level

“No sessions found” maps to 404; use warning to reduce error-noise.

-            logger.error("No sessions found for conversation %s", conversation_id)
+            logger.warning("No sessions found for conversation %s", conversation_id)

152-156: Fix return type: client.agents.session.list(...).data returns typed models, not dicts

Current annotation misleads and encourages dict-style access.

-async def get_agent_sessions(client: Any, conversation_id: str) -> list[dict[str, Any]]:
+async def get_agent_sessions(client: Any, conversation_id: str) -> list[Any]:

92-105: Prefer latest session, not arbitrary first

If multiple sessions exist, select the most recent by created_at (if available).

# before picking first:
sessions_sorted = sorted(
    agent_sessions,
    key=lambda s: (getattr(s, "created_at", None) or (s.get("created_at") if isinstance(s, dict) else None)) or "",
    reverse=True,
)
first = sessions_sorted[0]
src/utils/query.py (4)

150-156: Docstring is inaccurate (function returns 3 values).

Update to document (llama_stack_model_id, model_id, provider_id).

Apply:

-    Returns:
-        A tuple containing the combined model ID (in the format
-        "provider/model") and the provider ID.
+    Returns:
+        tuple[str, str, str]: (llama_stack_model_id "provider/model",
+        model_id, provider_id).

242-256: Tighten shield classification; current logic labels unknowns as input.

Use explicit prefix checks; avoid “not output ⇒ input”.

Apply:

-def is_input_shield(shield: Shield) -> bool:
+def is_input_shield(shield: Shield) -> bool:
@@
-    return _is_inout_shield(shield) or not is_output_shield(shield)
+    return _is_inout_shield(shield) or shield.identifier.startswith("input_")

33-52: Align OpenAPI response codes with behavior.

Docs list 400/403/503, while handlers raise 401/500 in places. Consider 401 for UnauthorizedResponse, and 500 (or 503 consistently) for upstream connectivity.

Apply (example):

-    400: {
-        "description": "Missing or invalid credentials provided by client",
-        "model": UnauthorizedResponse,
-    },
+    401: {
+        "description": "Missing or invalid credentials provided by client",
+        "model": UnauthorizedResponse,
+    },
@@
-    503: {
+    500: {
         "detail": {
             "response": "Unable to connect to Llama Stack",
             "cause": "Connection error.",
         }
     },

290-299: Minor: Drop Depends annotation in utility signature.

This is not a FastAPI endpoint; the Depends marker is misleading.

Apply:

-def validate_query_request(
+def validate_query_request(
     request: Request,
     query_request: QueryRequest,
-    auth: Annotated[AuthTuple, Depends],
+    auth: AuthTuple,
 ) -> tuple[str, UserConversation | None, str]:
tests/unit/app/test_routers.py (2)

67-67: Avoid brittle magic number for router count.

Asserting exact length will break when routers are added/removed. Prefer asserting presence only.

Apply:

-    assert len(app.routers) == 14
+    # avoid brittle count; presence checks below suffice

91-91: Same here: drop hardcoded router count.

Apply:

-    assert len(app.routers) == 14
+    # avoid brittle count; verify prefixes instead
tests/unit/app/endpoints/test_conversations.py (1)

273-274: Be consistent about patch targets.

You patch utils.conversations.check_suid here but configuration is patched under app.endpoints.conversations elsewhere. Prefer a single module boundary (utils.conversations) to reduce surprises.

tests/unit/app/endpoints/test_streaming_query_v2.py (1)

156-166: Consider asserting SSE headers.

Add checks for media_type text/event-stream and no-buffering headers to catch regressions.

Example:

assert resp.media_type == "text/event-stream"
assert resp.headers.get("X-Accel-Buffering") == "no"
src/app/endpoints/streaming_query_v2.py (2)

329-396: MCP headers param is unused; either plumb it or drop it.

You accept mcp_headers but never apply it to tool definitions.

Option A: merge headers per server URL:

-        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token, mcp_headers=mcp_headers)

…paired with updating get_mcp_tools signature to merge provided headers:

-def get_mcp_tools(mcp_servers: list, token: str | None = None) -> list[dict[str, Any]]:
+def get_mcp_tools(mcp_servers: list, token: str | None = None, mcp_headers: dict[str, dict[str, str]] | None = None) -> list[dict[str, Any]]:
@@
-        if token:
-            tool_def["headers"] = {"Authorization": f"Bearer {token}"}
+        headers = {}
+        if token:
+            headers["Authorization"] = f"Bearer {token}"
+        if mcp_headers:
+            headers.update(mcp_headers.get(mcp_server.url, {}))
+        if headers:
+            tool_def["headers"] = headers

Option B: remove mcp_headers from signatures to avoid dead parameters (less flexible).


12-15: Nit: unused import “status” from starlette.

Remove if not used.

-from starlette import status
tests/unit/app/endpoints/test_query.py (1)

237-243: Wrong default model/provider patch target.

select_model_and_provider_id reads defaults from utils.query.configuration, not metrics.utils.configuration. Patch the correct target so configuration-driven tests behave as intended.

-    mocker.patch(
-        "metrics.utils.configuration.inference.default_provider",
-        "default_provider",
-    )
-    mocker.patch(
-        "metrics.utils.configuration.inference.default_model",
-        "default_model",
-    )
+    mocker.patch("utils.query.configuration.inference.default_provider", "default_provider")
+    mocker.patch("utils.query.configuration.inference.default_model", "default_model")

Also applies to: 277-283

src/utils/streaming_query.py (1)

23-24: Make METADATA_PATTERN non-greedy and EOL-tolerant.

Current pattern can over-capture and requires a trailing newline. Use a non-greedy match and optional EOL.

-METADATA_PATTERN = re.compile(r"\nMetadata: (\{.+})\n")
+METADATA_PATTERN = re.compile(r"\nMetadata:\s*(\{.*?\})(?:\n|$)")
src/app/endpoints/conversations_v2.py (1)

27-29: Nit: use module-specific logger name for filtering.

Helps log routing and ops.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger("app.endpoints.conversations_v2")
src/app/endpoints/streaming_query.py (1)

62-63: Remove unused METADATA_PATTERN.

SSE parsing moved to utils.streaming_query; this constant is now dead.

-METADATA_PATTERN = re.compile(r"\nMetadata: (\{.+})\n")
src/app/endpoints/query_v2.py (1)

124-131: Wire mcp_headers into Responses API tool headers.

mcp_headers is accepted by the endpoint but unused in retrieve_response(). Merge provided headers with token-derived headers so callers can scope per-server credentials.

@@
-async def retrieve_response(  # pylint: disable=too-many-locals,too-many-branches
+async def retrieve_response(  # pylint: disable=too-many-locals,too-many-branches
@@
-    mcp_headers: dict[str, dict[str, str]] | None = None,
+    mcp_headers: dict[str, dict[str, str]] | None = None,
@@
-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers, token, mcp_headers)
         if mcp_tools:
             tools.extend(mcp_tools)
@@
-def get_mcp_tools(mcp_servers: list, token: str | None = None) -> list[dict[str, Any]]:
+def get_mcp_tools(
+    mcp_servers: list, token: str | None = None, extra_headers: dict[str, dict[str, str]] | None = None
+) -> list[dict[str, Any]]:
@@
-    for mcp_server in mcp_servers:
+    for mcp_server in mcp_servers:
         tool_def = {
             "type": "mcp",
             "server_label": mcp_server.name,
             "server_url": mcp_server.url,
             "require_approval": "never",
         }
 
-        # Add authentication if token provided (Response API format)
-        if token:
-            tool_def["headers"] = {"Authorization": f"Bearer {token}"}
+        # Merge auth from explicit headers and/or token
+        headers: dict[str, str] = {}
+        if extra_headers and (h := extra_headers.get(mcp_server.url) or extra_headers.get(mcp_server.name)):
+            headers.update(h)
+        if token and "Authorization" not in headers:
+            headers["Authorization"] = f"Bearer {token}"
+        if headers:
+            tool_def["headers"] = headers

Also applies to: 176-185, 312-329

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 692ede7 and 61301f7.

📒 Files selected for processing (18)
  • src/app/endpoints/conversations.py (7 hunks)
  • src/app/endpoints/conversations_v2.py (1 hunks)
  • src/app/endpoints/query.py (5 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query.py (2 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • src/constants.py (1 hunks)
  • src/utils/conversations.py (1 hunks)
  • src/utils/query.py (1 hunks)
  • src/utils/streaming_query.py (1 hunks)
  • src/utils/suid.py (1 hunks)
  • tests/unit/app/endpoints/test_conversations.py (14 hunks)
  • tests/unit/app/endpoints/test_query.py (7 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (5 hunks)
  • tests/unit/conftest.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/utils/suid.py
  • src/app/routers.py
  • src/utils/conversations.py
🧰 Additional context used
🧬 Code graph analysis (11)
tests/unit/app/endpoints/test_streaming_query_v2.py (4)
src/models/requests.py (1)
  • QueryRequest (72-222)
src/models/config.py (3)
  • config (132-138)
  • ModelContextProtocolServer (161-166)
  • Action (305-344)
src/app/endpoints/streaming_query_v2.py (1)
  • streaming_query_endpoint_handler_v2 (59-321)
src/configuration.py (1)
  • mcp_servers (88-92)
tests/unit/app/endpoints/test_query_v2.py (4)
src/models/requests.py (2)
  • QueryRequest (72-222)
  • Attachment (15-69)
src/models/config.py (2)
  • config (132-138)
  • ModelContextProtocolServer (161-166)
src/app/endpoints/query_v2.py (3)
  • get_rag_tools (298-309)
  • get_mcp_tools (312-328)
  • query_endpoint_handler_v2 (45-116)
src/configuration.py (1)
  • mcp_servers (88-92)
src/app/endpoints/conversations_v2.py (5)
src/authentication/__init__.py (1)
  • get_auth_dependency (14-43)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/models/config.py (2)
  • config (132-138)
  • Action (305-344)
src/models/responses.py (3)
  • ConversationDeleteResponse (443-476)
  • ConversationResponse (393-440)
  • ConversationsListResponse (540-597)
src/utils/conversations.py (4)
  • get_conversations_list_base (150-206)
  • delete_conversation_base (259-357)
  • validate_conversation_id (209-219)
  • validate_conversation_access (222-256)
src/app/endpoints/conversations.py (5)
src/models/config.py (2)
  • config (132-138)
  • Action (305-344)
src/models/responses.py (3)
  • ConversationResponse (393-440)
  • ConversationDeleteResponse (443-476)
  • ConversationsListResponse (540-597)
src/utils/endpoints.py (1)
  • check_configuration_loaded (62-68)
src/utils/conversations.py (5)
  • simplify_session_data (108-147)
  • get_conversations_list_base (150-206)
  • validate_conversation_id (209-219)
  • validate_conversation_access (222-256)
  • delete_conversation_base (259-357)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/app/endpoints/streaming_query_v2.py (14)
src/authentication/__init__.py (1)
  • get_auth_dependency (14-43)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/configuration.py (3)
  • configuration (60-64)
  • llama_stack_configuration (74-78)
  • mcp_servers (88-92)
src/models/config.py (2)
  • config (132-138)
  • Action (305-344)
src/models/database/conversations.py (1)
  • UserConversation (11-36)
src/models/requests.py (1)
  • QueryRequest (72-222)
src/utils/endpoints.py (4)
  • check_configuration_loaded (62-68)
  • get_system_prompt (71-111)
  • validate_model_provider_override (114-134)
  • validate_conversation_ownership (39-59)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/query.py (5)
  • evaluate_model_hints (94-133)
  • is_transcripts_enabled (55-61)
  • persist_user_conversation_details (64-91)
  • select_model_and_provider_id (136-213)
  • validate_attachments_metadata (258-287)
src/utils/streaming_query.py (3)
  • format_stream_data (26-37)
  • stream_start_event (40-61)
  • stream_end_event (64-100)
src/utils/transcripts.py (1)
  • store_transcript (33-86)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • ToolCallSummary (43-56)
src/app/endpoints/query_v2.py (2)
  • get_rag_tools (298-309)
  • get_mcp_tools (312-328)
tests/unit/app/endpoints/test_query.py (1)
src/utils/query.py (2)
  • is_transcripts_enabled (55-61)
  • select_model_and_provider_id (136-213)
src/app/endpoints/query_v2.py (6)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/models/requests.py (1)
  • QueryRequest (72-222)
src/models/responses.py (1)
  • QueryResponse (48-79)
src/utils/endpoints.py (1)
  • get_system_prompt (71-111)
src/utils/query.py (4)
  • evaluate_model_hints (94-133)
  • process_transcript_and_persist_conversation (352-383)
  • select_model_and_provider_id (136-213)
  • validate_attachments_metadata (258-287)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • ToolCallSummary (43-56)
src/app/endpoints/streaming_query.py (3)
src/app/endpoints/query.py (1)
  • get_rag_toolgroups (266-293)
src/utils/endpoints.py (3)
  • get_system_prompt (71-111)
  • validate_model_provider_override (114-134)
  • validate_conversation_ownership (39-59)
src/utils/streaming_query.py (3)
  • stream_start_event (40-61)
  • stream_end_event (64-100)
  • stream_build_event (103-146)
src/utils/query.py (8)
src/configuration.py (4)
  • configuration (60-64)
  • user_data_collection_configuration (81-85)
  • inference (121-125)
  • llama_stack_configuration (74-78)
src/app/database.py (1)
  • get_session (34-40)
src/models/database/conversations.py (1)
  • UserConversation (11-36)
src/models/requests.py (2)
  • QueryRequest (72-222)
  • Attachment (15-69)
src/models/responses.py (2)
  • UnauthorizedResponse (361-375)
  • ForbiddenResponse (378-390)
src/utils/endpoints.py (3)
  • check_configuration_loaded (62-68)
  • validate_conversation_ownership (39-59)
  • validate_model_provider_override (114-134)
src/utils/transcripts.py (1)
  • store_transcript (33-86)
src/utils/types.py (1)
  • TurnSummary (59-78)
src/utils/streaming_query.py (4)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • append_tool_calls_from_llama (65-78)
src/utils/endpoints.py (1)
  • get_system_prompt (71-111)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/app/endpoints/streaming_query.py (1)
  • response_generator (142-203)
src/app/endpoints/query.py (7)
src/authentication/__init__.py (1)
  • get_auth_dependency (14-43)
src/client.py (1)
  • AsyncLlamaStackClientHolder (18-55)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/models/requests.py (1)
  • QueryRequest (72-222)
src/models/responses.py (1)
  • QueryResponse (48-79)
src/utils/endpoints.py (2)
  • get_agent (138-187)
  • get_system_prompt (71-111)
src/utils/query.py (8)
  • evaluate_model_hints (94-133)
  • select_model_and_provider_id (136-213)
  • is_input_shield (242-255)
  • is_output_shield (229-239)
  • validate_attachments_metadata (258-287)
  • validate_query_request (290-335)
  • handle_api_connection_error (338-349)
  • process_transcript_and_persist_conversation (352-383)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: e2e_tests
🔇 Additional comments (13)
tests/unit/conftest.py (1)

26-36: LGTM: stable Turn mock prevents “did not complete” errors.

This makes downstream tests resilient to missing output_message.

tests/unit/app/test_routers.py (1)

21-24: LGTM: v2 routers included in import surface.

tests/unit/app/endpoints/test_conversations.py (1)

79-80: LGTM: context‑managed get_session mock is correct.

Matches production usage of get_session() with a context manager.

tests/unit/app/endpoints/test_query_v2.py (2)

57-82: LGTM: no-tools path verifies tools=None and prompt wiring.


118-157: LGTM: output parsing and tool call capture looks robust.

src/utils/query.py (1)

1-384: Repo-wide sweep for configuration accessor misuse — verify and fix accesses.

Found non-call attribute accesses to the configuration object; confirm each is a true attribute (leave) or should be invoked (call) and update accordingly:

  • src/utils/query.py:307 — logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
  • src/lightspeed_stack.py:62 — logger.info("Configuration: %s", configuration.configuration)
  • src/lightspeed_stack.py:64 — logger.info("Llama stack configuration: %s", configuration.llama_stack_configuration)
  • src/app/endpoints/streaming_query.py:100 — logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
  • src/app/endpoints/streaming_query_v2.py:89 — logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

If any referenced symbol is a callable (e.g., an accessor method), replace the attribute access with a call and adjust call sites; otherwise keep as-is.

tests/unit/app/endpoints/test_query.py (2)

95-101: Good: configuration-not-loaded path tested via utils.query.configuration None.

This aligns with the new utils-based organization. No changes needed.


62-64: Mocks now point to centralized utils — nice.

Redirecting ownership and persistence to utils.* keeps tests aligned with the refactor.

src/app/endpoints/conversations_v2.py (2)

32-43: Async placeholders match awaited base signature.

delete_conversation_base awaits the session helpers; keeping these async is correct.


85-101: Correct: don’t await a sync base — base is async now, so awaiting is right.

Past comments to drop await are obsolete given utils.conversations.delete_conversation_base is async.

src/app/endpoints/streaming_query.py (1)

151-186: SSE generation correctly delegates to shared utilities.

Centralization reduces drift between v1/v2. Looks good.

src/app/endpoints/query.py (2)

100-107: Good consolidation: transcript + conversation persistence via utils.

Reduces duplication and keeps v1/v2 behavior aligned.


191-223: Toolgroups assembly is correct and preserves no_tools semantics.

MCP headers are merged; empty toolgroups coerced to None. LGTM.

Comment on lines +186 to +225
# Create OpenAI response using responses API
response = await client.responses.create(
input=query_request.query,
model=model_id,
instructions=system_prompt,
previous_response_id=query_request.conversation_id,
tools=(cast(Any, tools) if tools else cast(Any, None)),
stream=False,
store=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Consider gating Responses API persistence on transcripts setting.

responses.create(store=True) persists data server-side regardless of local transcripts_enabled. Gate or make configurable to avoid unintended storage.

-    response = await client.responses.create(
+    store_resp = configuration.user_data_collection_configuration.transcripts_enabled
+    response = await client.responses.create(
         input=query_request.query,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
         tools=(cast(Any, tools) if tools else cast(Any, None)),
         stream=False,
-        store=True,
+        store=store_resp,
     )

Please confirm product intent: should server-side storage follow local transcript policy?


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
echo "---- rg version ----"
rg --version 2>/dev/null || true

echo "---- Search for responses.create / client.responses.create occurrences ----"
rg -n --hidden --no-ignore "responses\.create\(" || true
rg -n --hidden --no-ignore "client\.responses\.create\(" || true

echo "---- Search for 'store=' usages ----"
rg -n --hidden --no-ignore "store\s*=" -S || true

echo "---- Search for transcripts / user_data_collection_configuration ----"
rg -n --hidden --no-ignore "transcripts_enabled|transcripts|user_data_collection_configuration|user_data_collection" -S || true

echo "---- Search for 'configuration.' references ----"
rg -n --hidden --no-ignore "configuration\." -S || true

echo "---- Show src/app/endpoints/query_v2.py (if present) around lines 1-260 ----"
if [ -f src/app/endpoints/query_v2.py ]; then
  sed -n '1,260p' src/app/endpoints/query_v2.py || true
else
  echo "src/app/endpoints/query_v2.py not found"
fi

Length of output: 84095


Gate Responses API persistence on transcripts setting

responses.create(store=True) persists data server-side regardless of local transcripts_enabled. Calls found in src/app/endpoints/query_v2.py (≈lines 186–195) and src/app/endpoints/streaming_query_v2.py (≈lines 388–395) should pass store=configuration.user_data_collection_configuration.transcripts_enabled (or use utils.query.is_transcripts_enabled()) to avoid unintended remote storage.

-    response = await client.responses.create(
+    store_resp = configuration.user_data_collection_configuration.transcripts_enabled
+    response = await client.responses.create(
         input=query_request.query,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
         tools=(cast(Any, tools) if tools else cast(Any, None)),
         stream=False,
-        store=True,
+        store=store_resp,
     )

Confirm product intent: should server-side storage follow local transcript policy?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Create OpenAI response using responses API
response = await client.responses.create(
input=query_request.query,
model=model_id,
instructions=system_prompt,
previous_response_id=query_request.conversation_id,
tools=(cast(Any, tools) if tools else cast(Any, None)),
stream=False,
store=True,
)
# Create OpenAI response using responses API
store_resp = configuration.user_data_collection_configuration.transcripts_enabled
response = await client.responses.create(
input=query_request.query,
model=model_id,
instructions=system_prompt,
previous_response_id=query_request.conversation_id,
tools=(cast(Any, tools) if tools else cast(Any, None)),
stream=False,
store=store_resp,
)
🤖 Prompt for AI Agents
In src/app/endpoints/query_v2.py around lines 186 to 195, the call to
client.responses.create currently sets store=True which forces server-side
persistence regardless of local transcript settings; change the store argument
to respect the app configuration by using
configuration.user_data_collection_configuration.transcripts_enabled (or
utils.query.is_transcripts_enabled() if preferred) so the call becomes
store=configuration.user_data_collection_configuration.transcripts_enabled, and
ensure configuration is in scope or import the utils helper if using it.

Comment on lines +88 to +90
# log Llama Stack configuration
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Don’t log entire config; call accessor and log safe fields.

Same concern as in utils.query.

Apply:

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# log Llama Stack configuration
logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
# log Llama Stack configuration
cfg = configuration.llama_stack_configuration()
logger.info(
"Llama Stack version: %s; providers: %s",
getattr(cfg, "version", "n/a"),
getattr(cfg, "providers", "n/a"),
)
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query_v2.py around lines 88 to 90, avoid logging
the entire configuration object; instead call the accessor method that returns
only safe-to-log fields and log those specific fields (e.g., model name,
temperature, and any non-sensitive flags). Replace the direct logger.info of
configuration.llama_stack_configuration with a call like
configuration.get_safe_llama_stack_config() (or the existing accessor used in
utils.query) and log the returned safe dict/values so no secrets or sensitive
internal state are emitted.

Comment on lines +116 to +119
llama_stack_model_id, model_id, provider_id = select_model_and_provider_id(
await client.models.list(),
*evaluate_model_hints(user_conversation=None, query_request=query_request),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

BUG: evaluate_model_hints ignores validated conversation context.

You pass user_conversation=None instead of the resolved user_conversation.

Apply:

-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
llama_stack_model_id, model_id, provider_id = select_model_and_provider_id(
await client.models.list(),
*evaluate_model_hints(user_conversation=None, query_request=query_request),
)
llama_stack_model_id, model_id, provider_id = select_model_and_provider_id(
await client.models.list(),
*evaluate_model_hints(
user_conversation=user_conversation, query_request=query_request
),
)
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query_v2.py around lines 116 to 119,
evaluate_model_hints is being called with user_conversation=None which ignores
the validated/resolved conversation context; replace the None with the actual
resolved user_conversation variable (the one validated/constructed earlier in
this function) so evaluate_model_hints receives the real conversation, ensuring
model selection uses the correct context.

Comment on lines +379 to +381
mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
if mcp_tools:
tools.extend(mcp_tools)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

BUG: call configuration.mcp_servers() accessor.

Attribute access will fail; also aligns with other config accessors.

Apply:

-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
if mcp_tools:
tools.extend(mcp_tools)
mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)
if mcp_tools:
tools.extend(mcp_tools)
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query_v2.py around lines 379 to 381, the code
incorrectly accesses configuration.mcp_servers as an attribute; change it to
call the accessor configuration.mcp_servers() to match other config accessors
and avoid an attribute access error — call
get_mcp_tools(configuration.mcp_servers(), token) and keep the conditional and
tools.extend logic the same.

src/constants.py Outdated
# Minimal and maximal supported Llama Stack version
MINIMAL_SUPPORTED_LLAMA_STACK_VERSION = "0.2.17"
MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION = "0.2.20"
MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION = "0.2.21"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Confirm upstream compatibility for 0.2.21 before merge.

Bumping MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION impacts gating and client initialization. Please verify upstream supports 0.2.21 across your deployment matrix and adjust docs/tests if needed.

Run to sanity‑check version gates and any hardcoded expectations:


🏁 Script executed:

#!/bin/bash
# Find version guards and hardcoded mentions
rg -nP 'MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION|MINIMAL_SUPPORTED_LLAMA_STACK_VERSION|0\.2\.2[0-9]' -C2

Length of output: 9300


Confirm upstream compatibility for llama-stack 0.2.21 and align repository pins/docs/tests

MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION was bumped to "0.2.21" but the repo currently pins/mentions 0.2.20 — resolve before merge by either upgrading pinned deps/lock/docs/tests to 0.2.21 (and regenerating uv.lock) if upstream actually provides 0.2.21, or revert the constant to "0.2.20".

  • Affected locations (update or verify): pyproject.toml, uv.lock, examples/pyproject.llamastack.toml, README.md, docs/getting_started.md, docs/deployment_guide.md, docs/output.md, docs/openapi.md/openapi.json, src/models/responses.py, tests/e2e/features/info.feature, tests/unit/utils/test_llama_stack_version.py.
🤖 Prompt for AI Agents
In src/constants.py around line 5 the MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION was
changed to "0.2.21" but the repo still pins/mentions "0.2.20"; either update all
repository pins, locks, docs, tests and examples to 0.2.21 (pyproject.toml,
uv.lock, examples/pyproject.llamastack.toml, README.md, docs/* including
getting_started.md, deployment_guide.md, output.md, openapi.md/openapi.json,
src/models/responses.py, tests/e2e/features/info.feature,
tests/unit/utils/test_llama_stack_version.py) and regenerate uv.lock, or revert
this constant back to "0.2.20" so pins/docs/tests remain consistent before
merging.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)

350-354: Call configuration accessor for MCP servers.

Fix attribute vs method usage in both places.

-            for mcp_server in configuration.mcp_servers:
+            for mcp_server in configuration.mcp_servers():
                 mcp_headers[mcp_server.url] = {
                     "Authorization": f"Bearer {token}",
                 }
@@
-        toolgroups = (get_rag_toolgroups(vector_db_ids) or []) + [
-            mcp_server.name for mcp_server in configuration.mcp_servers
+        toolgroups = (get_rag_toolgroups(vector_db_ids) or []) + [
+            mcp_server.name for mcp_server in configuration.mcp_servers()
         ]

Also applies to: 367-368

🧹 Nitpick comments (15)
src/app/endpoints/conversations_v2.py (2)

27-27: Use module logger name.

Prefer module-scoped logger for consistency.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)

55-57: Annotate auth dependency with AuthTuple.

Improves type clarity and matches project conventions.

-    auth: Any = Depends(auth_dependency),
+    auth: Annotated[AuthTuple, Depends(auth_dependency)],

Add missing import at top of file:

from authentication.interface import AuthTuple

Also applies to: 65-68, 90-93

src/app/endpoints/query.py (1)

47-47: Use module logger name.

Align with logging guideline.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/utils/query.py (1)

30-30: Use module logger name.

Consistent logging naming.

-logger = logging.getLogger("utils.query")
+logger = logging.getLogger(__name__)
src/app/endpoints/query_v2.py (1)

38-38: Use module logger name.

Consistent logging naming.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
tests/unit/app/endpoints/test_conversations.py (1)

24-24: Use shared MOCK_AUTH constant.

Tests should use the shared auth mock for consistency.

-MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")
+# Prefer: from tests.unit.conftest import MOCK_AUTH
src/app/endpoints/streaming_query_v2.py (1)

52-52: Use module logger name.

Consistent logging naming.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/app/endpoints/streaming_query.py (3)

141-142: Avoid logging full config; log safe fields and call accessor.

Prevents secrets leakage.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )

256-256: Set SSE media type and headers.

Improves client compatibility and reduces buffering.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

58-58: Use module logger name.

Consistent logging naming.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/utils/streaming_query.py (1)

21-21: Use module logger name.

Consistent logging naming.

-logger = logging.getLogger("utils.streaming_query")
+logger = logging.getLogger(__name__)
src/app/endpoints/conversations.py (1)

32-32: Use module-standard logger name

Prefer name per guidelines for consistent logger hierarchy.

Apply this diff:

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/utils/conversations.py (3)

239-244: Clarify error message for non-UUID SUIDs

check_suid accepts resp‑* IDs; message should not claim “not a valid UUID”.

Apply this diff:

-            detail={
-                "response": "Invalid conversation ID format",
-                "cause": f"Conversation ID {conversation_id} is not a valid UUID",
-            },
+            detail={
+                "response": "Invalid conversation ID format",
+                "cause": f"Conversation ID {conversation_id} is not a valid conversation ID format",
+            },

345-346: Avoid blocking in async path: run local DB delete in threadpool

delete_conversation performs DB I/O; run it off the event loop.

Apply this diff:

-from fastapi import HTTPException, Request, status
+from fastapi import HTTPException, Request, status
+from starlette.concurrency import run_in_threadpool
@@
-        delete_conversation(conversation_id=conversation_id)
+        await run_in_threadpool(delete_conversation, conversation_id=conversation_id)

Also applies to: 8-12


28-28: Use module-standard logger name

Align with guideline to use name.

Apply this diff:

-logger = logging.getLogger("utils.conversations")
+logger = logging.getLogger(__name__)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61301f7 and d16d834.

📒 Files selected for processing (18)
  • src/app/endpoints/conversations.py (7 hunks)
  • src/app/endpoints/conversations_v2.py (1 hunks)
  • src/app/endpoints/query.py (5 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query.py (2 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • src/constants.py (1 hunks)
  • src/utils/conversations.py (1 hunks)
  • src/utils/query.py (1 hunks)
  • src/utils/streaming_query.py (1 hunks)
  • src/utils/suid.py (1 hunks)
  • tests/unit/app/endpoints/test_conversations.py (14 hunks)
  • tests/unit/app/endpoints/test_query.py (7 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (5 hunks)
  • tests/unit/conftest.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • tests/unit/app/endpoints/test_query_v2.py
  • tests/unit/app/test_routers.py
  • tests/unit/app/endpoints/test_query.py
  • src/app/routers.py
  • src/utils/suid.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • src/constants.py
  • tests/unit/conftest.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
  • src/utils/query.py
  • src/utils/streaming_query.py
  • src/utils/conversations.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/conversations.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/conversations.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/streaming_query.py
  • tests/unit/app/endpoints/test_conversations.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
  • src/utils/query.py
  • src/utils/streaming_query.py
  • src/utils/conversations.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/conversations.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/conversations.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/conversations.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_conversations.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_conversations.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
PR: lightspeed-core/lightspeed-stack#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Handle Llama Stack APIConnectionError when interacting with the Llama client

Applied to files:

  • tests/unit/app/endpoints/test_conversations.py
🧬 Code graph analysis (9)
src/app/endpoints/conversations_v2.py (5)
src/authentication/__init__.py (1)
  • get_auth_dependency (14-43)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/models/config.py (2)
  • config (138-144)
  • Action (311-350)
src/models/responses.py (3)
  • ConversationDeleteResponse (443-476)
  • ConversationResponse (393-440)
  • ConversationsListResponse (540-597)
src/utils/conversations.py (4)
  • get_conversations_list_base (175-231)
  • delete_conversation_base (284-382)
  • validate_conversation_id (234-244)
  • validate_conversation_access (247-281)
src/app/endpoints/streaming_query.py (4)
src/app/endpoints/query.py (1)
  • get_rag_toolgroups (266-293)
src/utils/endpoints.py (5)
  • check_configuration_loaded (63-69)
  • get_agent (139-192)
  • get_system_prompt (72-112)
  • validate_model_provider_override (115-135)
  • validate_conversation_ownership (40-60)
src/utils/query.py (7)
  • is_input_shield (242-255)
  • is_output_shield (229-239)
  • is_transcripts_enabled (55-61)
  • select_model_and_provider_id (136-213)
  • validate_attachments_metadata (258-287)
  • persist_user_conversation_details (64-91)
  • evaluate_model_hints (94-133)
src/utils/streaming_query.py (3)
  • stream_start_event (40-61)
  • stream_end_event (64-100)
  • stream_build_event (104-146)
src/app/endpoints/query_v2.py (9)
src/authentication/__init__.py (1)
  • get_auth_dependency (14-43)
src/authorization/middleware.py (1)
  • authorize (111-122)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/models/requests.py (1)
  • QueryRequest (72-222)
src/models/responses.py (1)
  • QueryResponse (48-79)
src/utils/endpoints.py (1)
  • get_system_prompt (72-112)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/query.py (5)
  • evaluate_model_hints (94-133)
  • process_transcript_and_persist_conversation (357-388)
  • select_model_and_provider_id (136-213)
  • validate_attachments_metadata (258-287)
  • validate_query_request (290-340)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • ToolCallSummary (43-56)
src/app/endpoints/query.py (3)
src/client.py (1)
  • AsyncLlamaStackClientHolder (18-55)
src/utils/endpoints.py (1)
  • get_agent (139-192)
src/utils/query.py (8)
  • evaluate_model_hints (94-133)
  • select_model_and_provider_id (136-213)
  • is_input_shield (242-255)
  • is_output_shield (229-239)
  • validate_attachments_metadata (258-287)
  • validate_query_request (290-340)
  • handle_api_connection_error (343-354)
  • process_transcript_and_persist_conversation (357-388)
src/utils/query.py (9)
src/configuration.py (4)
  • configuration (65-69)
  • user_data_collection_configuration (86-90)
  • inference (126-130)
  • llama_stack_configuration (79-83)
src/app/database.py (1)
  • get_session (34-40)
src/models/config.py (2)
  • config (138-144)
  • Action (311-350)
src/models/database/conversations.py (1)
  • UserConversation (11-36)
src/models/requests.py (2)
  • QueryRequest (72-222)
  • Attachment (15-69)
src/models/responses.py (2)
  • UnauthorizedResponse (361-375)
  • ForbiddenResponse (378-390)
src/utils/endpoints.py (3)
  • check_configuration_loaded (63-69)
  • validate_conversation_ownership (40-60)
  • validate_model_provider_override (115-135)
src/utils/transcripts.py (1)
  • store_transcript (33-86)
src/utils/types.py (1)
  • TurnSummary (59-78)
src/utils/streaming_query.py (3)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • append_tool_calls_from_llama (65-78)
src/utils/endpoints.py (1)
  • get_system_prompt (72-112)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/utils/conversations.py (7)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/app/database.py (1)
  • get_session (34-40)
src/models/config.py (2)
  • config (138-144)
  • Action (311-350)
src/models/database/conversations.py (1)
  • UserConversation (11-36)
src/models/responses.py (4)
  • ConversationDeleteResponse (443-476)
  • ConversationsListResponse (540-597)
  • ConversationDetails (479-537)
  • UnauthorizedResponse (361-375)
src/utils/endpoints.py (3)
  • check_configuration_loaded (63-69)
  • delete_conversation (23-37)
  • validate_conversation_ownership (40-60)
src/utils/suid.py (1)
  • check_suid (15-44)
src/app/endpoints/streaming_query_v2.py (9)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/configuration.py (3)
  • configuration (65-69)
  • llama_stack_configuration (79-83)
  • mcp_servers (93-97)
src/models/requests.py (1)
  • QueryRequest (72-222)
src/utils/endpoints.py (4)
  • check_configuration_loaded (63-69)
  • get_system_prompt (72-112)
  • validate_model_provider_override (115-135)
  • validate_conversation_ownership (40-60)
src/utils/query.py (5)
  • evaluate_model_hints (94-133)
  • is_transcripts_enabled (55-61)
  • persist_user_conversation_details (64-91)
  • select_model_and_provider_id (136-213)
  • validate_attachments_metadata (258-287)
src/utils/streaming_query.py (3)
  • format_stream_data (26-37)
  • stream_start_event (40-61)
  • stream_end_event (64-100)
src/utils/transcripts.py (1)
  • store_transcript (33-86)
src/utils/types.py (2)
  • TurnSummary (59-78)
  • ToolCallSummary (43-56)
src/app/endpoints/query_v2.py (2)
  • get_rag_tools (298-309)
  • get_mcp_tools (312-328)
src/app/endpoints/conversations.py (6)
src/configuration.py (1)
  • configuration (65-69)
src/models/config.py (2)
  • config (138-144)
  • Action (311-350)
src/models/responses.py (3)
  • ConversationResponse (393-440)
  • ConversationDeleteResponse (443-476)
  • ConversationsListResponse (540-597)
src/utils/endpoints.py (1)
  • check_configuration_loaded (63-69)
src/utils/conversations.py (5)
  • simplify_session_data (133-172)
  • get_conversations_list_base (175-231)
  • validate_conversation_id (234-244)
  • validate_conversation_access (247-281)
  • delete_conversation_base (284-382)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: e2e_tests
🔇 Additional comments (15)
src/app/endpoints/conversations_v2.py (3)

32-43: Resolved: helpers are correctly async now (previous sync suggestion is obsolete).

get_response_sessions aligns with async delete_conversation_base and is awaited at call site.


45-49: Resolved: deletion helper is correctly async.

Matches the awaited usage and async base signature.


95-101: Resolved: using await on async base is correct.

delete_conversation_base is async in utils; keeping await avoids TypeError.

src/utils/query.py (3)

307-308: Avoid logging full config; log safe fields and call accessor.

Prevents secrets leakage and fixes accessor usage.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )

136-214: Fix model/provider selection: wrong config access and wrong early return.

  • Call configuration.inference() accessor.
  • Return (llama_stack_model_id, model_id, provider_id) when defaulting to first LLM.
 def select_model_and_provider_id(
     models: ModelListResponse, model_id: str | None, provider_id: str | None
 ) -> tuple[str, str, str]:
@@
-    if not model_id or not provider_id:
+    if not model_id or not provider_id:
         logger.debug(
             "No model ID or provider ID specified in request, checking configuration"
         )
-        model_id = configuration.inference.default_model  # type: ignore[reportAttributeAccessIssue]
-        provider_id = (
-            configuration.inference.default_provider  # type: ignore[reportAttributeAccessIssue]
-        )
+        model_id = configuration.inference().default_model
+        provider_id = configuration.inference().default_provider
@@
-    if not model_id or not provider_id:
+    if not model_id or not provider_id:
         logger.debug(
             "No model ID or provider ID specified in request or configuration, "
             "using the first available LLM"
         )
         try:
             model = next(
                 m
                 for m in models
                 if m.model_type == "llm"  # pyright: ignore[reportAttributeAccessIssue]
             )
-            model_id = model.identifier
-            provider_id = model.provider_id
-            logger.info("Selected model: %s", model)
-            return model_id, model_id, provider_id
+            llama_stack_model_id = model.identifier
+            provider_id = model.provider_id
+            # derive plain model id if identifier is "provider/model"
+            model_id = (
+                llama_stack_model_id.split("/", 1)[1]
+                if "/" in llama_stack_model_id
+                else llama_stack_model_id
+            )
+            logger.info("Selected model: %s", llama_stack_model_id)
+            return llama_stack_model_id, model_id, provider_id
@@
-    llama_stack_model_id = f"{provider_id}/{model_id}"
+    llama_stack_model_id = f"{provider_id}/{model_id}"
@@
-    return llama_stack_model_id, model_id, provider_id
+    return llama_stack_model_id, model_id, provider_id

55-61: Bug: configuration accessor not called (runtime AttributeError).

user_data_collection_configuration is a method.

-def is_transcripts_enabled() -> bool:
+def is_transcripts_enabled() -> bool:
@@
-    return configuration.user_data_collection_configuration.transcripts_enabled
+    return configuration.user_data_collection_configuration().transcripts_enabled
src/app/endpoints/streaming_query_v2.py (4)

88-90: Avoid logging full config; log safe fields and call accessor.

Prevents secrets leakage.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )

306-309: Set proper SSE media type and headers.

Ensures clients process events without buffering.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

116-119: Pass the resolved conversation to evaluate_model_hints.

Currently ignores validated conversation context.

-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),

379-381: Call configuration accessor for MCP servers.

Fix attribute vs method usage.

-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)
src/utils/streaming_query.py (1)

123-126: Short-circuit on error chunks to avoid AttributeError.

Don’t access chunk.event after handling an error chunk.

-    if hasattr(chunk, "error"):
-        yield from _handle_error_event(chunk, chunk_id)
+    if hasattr(chunk, "error"):
+        yield from _handle_error_event(chunk, chunk_id)
+        return
src/app/endpoints/query_v2.py (1)

194-195: Gate server-side persistence to transcripts setting.

Avoid unintended storage when transcripts are disabled.

-        stream=False,
-        store=True,
+        stream=False,
+        store=configuration.user_data_collection_configuration().transcripts_enabled,
src/app/endpoints/conversations.py (3)

103-105: Don’t call .get on possibly typed session objects; extract session_id safely

Agent SDK may return typed objects; .get will raise. Handle dict/attr and fail fast if missing.

Apply this diff:

-        session_id = str(agent_sessions[0].get("session_id"))
+        first = agent_sessions[0]
+        session_id = (
+            first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+        )
+        if not session_id:
+            raise HTTPException(
+                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                detail={"response": "session_id missing on session"},
+            )
+        session_id = str(session_id)

44-44: Run sync base in a threadpool to avoid blocking the event loop

get_conversations_list_base does DB I/O and is sync; calling it directly from an async endpoint will block.

Apply this diff:

-from fastapi import APIRouter, HTTPException, Request, status, Depends
+from fastapi import APIRouter, HTTPException, Request, status, Depends
+from starlette.concurrency import run_in_threadpool
@@
-    return get_conversations_list_base(request, auth)
+    return await run_in_threadpool(get_conversations_list_base, request, auth)

Also applies to: 8-8


152-156: Broaden helper typings and fix session_id extraction for delete path

Avoid assuming dicts; accept list[Any] and handle dict/attr for session_id.

Apply this diff:

-async def get_agent_sessions(client: Any, conversation_id: str) -> list[dict[str, Any]]:
+async def get_agent_sessions(client: Any, conversation_id: str) -> list[Any]:
@@
-async def delete_agent_sessions(
-    client: Any, conversation_id: str, sessions: list[dict[str, Any]]
-) -> None:
+async def delete_agent_sessions(
+    client: Any, conversation_id: str, sessions: list[Any]
+) -> None:
@@
-    session_id = str(sessions[0].get("session_id"))
+    first = sessions[0]
+    session_id = (
+        first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+    )
+    if not session_id:
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            detail={"response": "session_id missing on session"},
+        )
+    session_id = str(session_id)

Also applies to: 158-164

Comment on lines +177 to +209
mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
if mcp_tools:
tools.extend(mcp_tools)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Call configuration accessor for MCP servers.

mcp_servers is a method.

-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
if mcp_tools:
tools.extend(mcp_tools)
mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)
if mcp_tools:
tools.extend(mcp_tools)
🤖 Prompt for AI Agents
In src/app/endpoints/query_v2.py around lines 177 to 179, the code treats
configuration.mcp_servers as an attribute but it is a method; call
configuration.mcp_servers() and pass its return value to get_mcp_tools (i.e.,
replace configuration.mcp_servers with configuration.mcp_servers()), ensuring
any required arguments or async/await usage are respected if the accessor is
async.

Comment on lines +388 to +396
response = await client.responses.create(
input=query_request.query,
model=model_id,
instructions=system_prompt,
previous_response_id=query_request.conversation_id,
tools=(cast(Any, tools) if tools else cast(Any, None)),
stream=True,
store=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Gate server-side persistence to transcripts setting.

Avoid persisting when transcripts are disabled.

-    response = await client.responses.create(
+    response = await client.responses.create(
         input=query_request.query,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
         tools=(cast(Any, tools) if tools else cast(Any, None)),
         stream=True,
-        store=True,
+        store=configuration.user_data_collection_configuration().transcripts_enabled,
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response = await client.responses.create(
input=query_request.query,
model=model_id,
instructions=system_prompt,
previous_response_id=query_request.conversation_id,
tools=(cast(Any, tools) if tools else cast(Any, None)),
stream=True,
store=True,
)
response = await client.responses.create(
input=query_request.query,
model=model_id,
instructions=system_prompt,
previous_response_id=query_request.conversation_id,
tools=(cast(Any, tools) if tools else cast(Any, None)),
stream=True,
store=configuration.user_data_collection_configuration().transcripts_enabled,
)
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query_v2.py around lines 388 to 396, the call to
client.responses.create always sets store=True which persists responses even
when server-side transcripts are disabled; change this to gate persistence
behind the transcripts/analytics setting (e.g. use the app config or request
flag that indicates transcripts are enabled) and pass store=(True if
transcripts_enabled else False) (or omit the store parameter when disabled) so
responses are only persisted when the transcripts feature is enabled.

Comment on lines 68 to 73
200: {
"conversation_id": "123e4567-e89b-12d3-a456-426614174000",
"success": True,
"message": "Conversation deleted successfully",
},
400: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

OpenAPI schema mismatch: use response field (not message)

ConversationDeleteResponse uses response; docs example shows message.

Apply this diff:

 conversation_delete_responses: dict[int | str, dict[str, Any]] = {
         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
         "success": True,
-        "message": "Conversation deleted successfully",
+        "response": "Conversation deleted successfully",
     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
200: {
"conversation_id": "123e4567-e89b-12d3-a456-426614174000",
"success": True,
"message": "Conversation deleted successfully",
},
400: {
200: {
"conversation_id": "123e4567-e89b-12d3-a456-426614174000",
"success": True,
"response": "Conversation deleted successfully",
},
400: {
🤖 Prompt for AI Agents
In src/utils/conversations.py around lines 68 to 73, the OpenAPI example for the
200 response uses the key "message" but the ConversationDeleteResponse schema
expects "response"; update the example to replace the "message" field with
"response" (e.g., "response": "Conversation deleted successfully") so the docs
match the schema.

@luis5tb luis5tb force-pushed the responses_v2 branch 2 times, most recently from 387f7b0 to b9ae829 Compare September 29, 2025 08:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
src/utils/suid.py (2)

21-25: Docstring incorrect for Responses IDs.

States “follows a valid UUID string” but we should treat resp-* as opaque; update wording accordingly.

Apply:

-    Returns True if the string is a valid UUID or if it starts with resp-/resp_
-    and it follows a valid UUID string, False otherwise.
+    Returns True if the string is a valid UUID, or if it starts with resp-/resp_
+    and contains a non-empty opaque token, False otherwise.

26-33: Parameter type mismatch in docstring.

Docstring says str | bytes, function accepts str. Align docstring (or add bytes support by decoding).

Apply either:

  • Update docstring to str only, or
  • Accept bytes:
-def check_suid(suid: str) -> bool:
+def check_suid(suid: str | bytes) -> bool:
@@
-    if not isinstance(suid, str) or not suid:
+    if isinstance(suid, (bytes, bytearray)):
+        try:
+            suid = suid.decode("utf-8")
+        except Exception:
+            return False
+    if not isinstance(suid, str) or not suid:
         return False
src/app/routers.py (1)

32-41: Section label/order mismatch.

conversations_v2 is mounted under /v2 but appears in the “V1 endpoints” block. Move it under the V2 section or fix the comment to avoid confusion.

tests/unit/app/test_routers.py (2)

65-80: Reduce brittleness: avoid asserting exact router count.

Exact counts break when routes evolve. Prefer presence checks against an expected set.

Example:

-    assert len(app.routers) == 15
+    # keep flexible: assert presence instead of exact count

83-104: Add coverage for /v3 conversations mount.

conversations.router is also included under /v3; extend the test to validate that additional prefix to prevent regressions.

You can add a helper that returns all prefixes for a router and assert "/v1" in prefixes and "/v3" in prefixes.

src/app/endpoints/streaming_query_v2.py (2)

52-53: Use module logger name for consistency.

Prefer getLogger(name) per guidelines.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)

365-387: Optional: propagate per-server MCP headers.

You accept mcp_headers but don’t use them when building tools; consider merging headers for each server.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d16d834 and b9ae829.

📒 Files selected for processing (8)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • src/constants.py (1 hunks)
  • src/utils/suid.py (2 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • src/app/endpoints/query_v2.py
  • tests/unit/app/endpoints/test_query_v2.py
🧰 Additional context used
📓 Path-based instructions (8)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/routers.py
  • src/utils/suid.py
  • src/app/endpoints/streaming_query_v2.py
  • src/constants.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/routers.py
  • src/app/endpoints/streaming_query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/routers.py
  • src/utils/suid.py
  • src/app/endpoints/streaming_query_v2.py
  • src/constants.py
  • tests/unit/app/test_routers.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/routers.py
  • src/app/endpoints/streaming_query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query_v2.py
src/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep shared constants in a central src/constants.py with descriptive comments

Files:

  • src/constants.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/test_routers.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/test_routers.py
🧬 Code graph analysis (2)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (34-49)
src/app/endpoints/streaming_query_v2.py (8)
src/app/endpoints/query.py (5)
  • evaluate_model_hints (130-169)
  • is_transcripts_enabled (91-97)
  • persist_user_conversation_details (100-127)
  • select_model_and_provider_id (352-430)
  • validate_attachments_metadata (688-717)
src/utils/endpoints.py (4)
  • validate_conversation_ownership (41-61)
  • check_configuration_loaded (64-70)
  • get_system_prompt (73-113)
  • validate_model_provider_override (116-136)
src/app/endpoints/query_v2.py (2)
  • get_rag_tools (313-324)
  • get_mcp_tools (327-343)
src/app/endpoints/streaming_query.py (3)
  • format_stream_data (105-116)
  • stream_start_event (119-140)
  • stream_end_event (143-179)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/transcripts.py (1)
  • store_transcript (33-86)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: e2e_tests
🔇 Additional comments (8)
src/constants.py (1)

5-5: Align 0.2.22 bump across pins/docs/tests (or revert).

Make sure upstream 0.2.22 is available and all repo references match; otherwise gate checks and tests will drift. This was flagged earlier for 0.2.21.

Run:

#!/bin/bash
# Find version constants and hardcoded mentions
rg -nP -C2 'MINIMAL_SUPPORTED_LLAMA_STACK_VERSION|MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION|0\.2\.(1[7-9]|2[0-9])'

# Check common pins/mentions
fd -a 'pyproject.toml|uv.lock|README.md|**/*.md|**/*.toml|**/*.json' \
  | xargs rg -nP 'llama[-_ ]?stack|0\.2\.(1[7-9]|2[0-9])' -C2
src/app/endpoints/streaming_query_v2.py (5)

379-381: Remove incorrect accessor change
mcp_servers is defined as a property, so configuration.mcp_servers (not configuration.mcp_servers()) is correct and needs no change.

Likely an incorrect or invalid review comment.


116-119: BUG: model selection ignores resolved conversation context.

Pass the validated user_conversation instead of None.

-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),

388-396: Gate server-side persistence to transcripts setting.

Avoid persisting when transcripts are disabled.

-        store=True,
+        store=is_transcripts_enabled(),

308-309: SSE requires proper media type and no-buffer headers.

Without text/event-stream, many clients won’t stream tokens.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

88-90: Avoid logging full configuration — risk of leaking secrets.

Log only safe fields (e.g., version). Don’t serialize the whole object.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    try:
+        cfg = configuration.llama_stack_configuration()  # if accessor; else treat as attr
+    except TypeError:
+        cfg = configuration.llama_stack_configuration  # fallback
+    logger.debug("Llama Stack version: %s", getattr(cfg, "version", "n/a"))
src/utils/suid.py (1)

37-49: Critical: Don’t coerce resp- IDs into UUIDs (remove zero‑padding/UUID parse).*

Response IDs are opaque; padding/truncation risks accepting malformed IDs and rejecting valid ones.

Apply:

-    # Handle Responses API IDs
-    if suid.startswith("resp-") or suid.startswith("resp_"):
-        token = suid[5:]
-        if not token:
-            return False
-        # If truncated (e.g., shell cut reduced length), pad to canonical UUID length
-        if len(token) < 36:
-            token = token + ("0" * (36 - len(token)))
-        try:
-            uuid.UUID(token)
-            return True
-        except (ValueError, TypeError):
-            return False
+    # Handle Responses API IDs – treat as opaque, minimally validated
+    if suid.startswith(("resp-", "resp_")):
+        token = suid[5:]
+        # non-empty token with safe charset
+        return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token)
src/app/routers.py (1)

45-46: src/app/routers.py mounts conversations.router under both /v1 (line 39) and /v3 (line 45) — confirm this reuse is intentional and adjust operationIds or tags to prevent duplicate OpenAPI operations.

@luis5tb luis5tb changed the title [WIP] Responses v2 Responses v2 support Sep 29, 2025
@omertuc
Copy link
Contributor

omertuc commented Sep 29, 2025

Let's hold it until we have a conversations API

@luis5tb
Copy link
Contributor Author

luis5tb commented Oct 6, 2025

Let's hold it until we have a conversations API

Conversations (initial support) PR was merged! \o/

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
src/app/endpoints/streaming_query_v2.py (4)

88-90: Avoid logging entire config; log safe fields only

Do not log the full configuration object. Log non-sensitive fields via an accessor.

-    # log Llama Stack configuration
-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    # log safe subset of Llama Stack configuration
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )

As per coding guidelines


116-119: BUG: model selection ignores conversation context (passes None)

You pass user_conversation=None, breaking model/provider hinting.

-            await client.models.list(),
-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            await client.models.list(),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),

308-309: Set SSE media type and no-buffering headers

Many clients require text/event-stream and anti-buffering headers.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

388-396: Gate Responses API persistence to transcripts setting

Avoid server-side storage when transcripts are disabled.

     response = await client.responses.create(
         input=query_request.query,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
         tools=(cast(Any, tools) if tools else cast(Any, None)),
         stream=True,
-        store=True,
+        store=is_transcripts_enabled(),
     )
🧹 Nitpick comments (2)
src/app/endpoints/streaming_query_v2.py (2)

12-15: Use FastAPI’s status import (not Starlette) to follow project conventions

Import status from fastapi alongside other FastAPI primitives.

As per coding guidelines

-from fastapi import APIRouter, Depends, Request, HTTPException
+from fastapi import APIRouter, Depends, Request, HTTPException, status
-from fastapi.responses import StreamingResponse
-from starlette import status
+from fastapi.responses import StreamingResponse

52-52: Standardize logger to module name

Prefer logger = logging.getLogger(name) for correct module scoping.

As per coding guidelines

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb540e3 and d660592.

📒 Files selected for processing (4)
  • src/app/endpoints/query.py (2 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query.py (1 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/app/endpoints/query_v2.py
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
🧬 Code graph analysis (2)
src/app/endpoints/streaming_query.py (1)
src/app/endpoints/query.py (1)
  • get_rag_toolgroups (766-793)
src/app/endpoints/streaming_query_v2.py (8)
src/app/endpoints/query.py (5)
  • evaluate_model_hints (135-174)
  • is_transcripts_enabled (91-97)
  • persist_user_conversation_details (100-132)
  • select_model_and_provider_id (396-474)
  • validate_attachments_metadata (734-763)
src/utils/endpoints.py (4)
  • validate_conversation_ownership (45-65)
  • check_configuration_loaded (68-80)
  • get_system_prompt (83-147)
  • validate_model_provider_override (164-184)
src/app/endpoints/query_v2.py (2)
  • get_rag_tools (329-340)
  • get_mcp_tools (343-359)
src/app/endpoints/streaming_query.py (3)
  • format_stream_data (120-131)
  • stream_start_event (134-155)
  • stream_end_event (158-214)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/transcripts.py (1)
  • store_transcript (33-90)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (5)
src/app/endpoints/streaming_query.py (2)

1018-1024: LGTM: vector_store_ids and toolgroups construction

Switch to vector_stores.list().data and propagate IDs into get_rag_toolgroups; preserves None when empty.


766-793: LGTM: get_rag_toolgroups now keyed by vector_store_ids

Signature/docs updated; args={"vector_store_ids": ...} matches upstream changes.

src/app/endpoints/query.py (2)

678-683: LGTM: vector_store_ids used for RAG toolgroups

Retrieval via vector_stores.list().data aligns with updated API.


767-792: LGTM: get_rag_toolgroups signature and argument name updates

Docs and implementation consistent with vector_store_ids change.

src/app/endpoints/streaming_query_v2.py (1)

379-386: Drop callable accessor check and header merge—configuration.mcp_servers is a property and there’s no mcp_headers in this scope.

Likely an incorrect or invalid review comment.

Comment on lines +169 to +175
persist_user_conversation_details(
user_id=user_id,
conversation_id=conv_id,
model=model_id,
provider_id=provider_id,
)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

BUG: missing required argument topic_summary in persist_user_conversation_details call

Signature requires topic_summary; pass None at creation time.

-                        persist_user_conversation_details(
+                        persist_user_conversation_details(
                             user_id=user_id,
                             conversation_id=conv_id,
                             model=model_id,
                             provider_id=provider_id,
+                            topic_summary=None,
                         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
persist_user_conversation_details(
user_id=user_id,
conversation_id=conv_id,
model=model_id,
provider_id=provider_id,
)
continue
persist_user_conversation_details(
user_id=user_id,
conversation_id=conv_id,
model=model_id,
provider_id=provider_id,
topic_summary=None,
)
continue
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query_v2.py around lines 169 to 175, the call to
persist_user_conversation_details is missing the required topic_summary
parameter; update the call to include topic_summary=None (e.g.,
topic_summary=None) when creating the conversation so the function signature is
satisfied.

Comment on lines +282 to +283
yield stream_end_event(metadata_map)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

BUG: stream_end_event called with wrong arity; requires summary and token_usage

Pass a TokenCounter placeholder and current summary to match signature.

-            yield stream_end_event(metadata_map)
+            yield stream_end_event(metadata_map, summary, TokenCounter())

Add missing import at top:

+from utils.token_counter import TokenCounter
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
yield stream_end_event(metadata_map)
# At the top of src/app/endpoints/streaming_query_v2.py, add the import
from fastapi import APIRouter, HTTPException, Request, status, Depends
from utils.token_counter import TokenCounter
# … later in the file, around line 282 …
yield stream_end_event(metadata_map, summary, TokenCounter())
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query_v2.py around lines 282-283,
stream_end_event is being called with the wrong arity; update the call to pass
the current summary object and a TokenCounter placeholder instance (e.g.,
TokenCounter()) so it matches the required signature (summary, token_usage), and
add the missing import for TokenCounter at the top of the file (import
TokenCounter from the module where it’s defined).

Comment on lines +284 to +301
if not is_transcripts_enabled():
logger.debug("Transcript collection is disabled in the configuration")
else:
store_transcript(
user_id=user_id,
conversation_id=conv_id,
model_id=model_id,
provider_id=provider_id,
query_is_valid=True, # TODO(lucasagomes): implement as part of query validation
query=query_request.query,
query_request=query_request,
summary=summary,
rag_chunks=[], # TODO(lucasagomes): implement rag_chunks
truncated=False, # TODO(lucasagomes): implement truncation as part
# of quota work
attachments=query_request.attachments or [],
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid storing transcripts when conversation_id is empty

If conv_id is not available, skip storage to avoid invalid paths.

-            if not is_transcripts_enabled():
+            if not is_transcripts_enabled():
                 logger.debug("Transcript collection is disabled in the configuration")
-            else:
+            elif not conv_id:
+                logger.debug("Skipping transcript storage: missing conversation_id")
+            else:
                 store_transcript(
                     user_id=user_id,
                     conversation_id=conv_id,
                     model_id=model_id,
                     provider_id=provider_id,
                     query_is_valid=True,  # TODO(lucasagomes): implement as part of query validation
                     query=query_request.query,
                     query_request=query_request,
                     summary=summary,
                     rag_chunks=[],  # TODO(lucasagomes): implement rag_chunks
                     truncated=False,  # TODO(lucasagomes): implement truncation as part
                     # of quota work
                     attachments=query_request.attachments or [],
                 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not is_transcripts_enabled():
logger.debug("Transcript collection is disabled in the configuration")
else:
store_transcript(
user_id=user_id,
conversation_id=conv_id,
model_id=model_id,
provider_id=provider_id,
query_is_valid=True, # TODO(lucasagomes): implement as part of query validation
query=query_request.query,
query_request=query_request,
summary=summary,
rag_chunks=[], # TODO(lucasagomes): implement rag_chunks
truncated=False, # TODO(lucasagomes): implement truncation as part
# of quota work
attachments=query_request.attachments or [],
)
if not is_transcripts_enabled():
logger.debug("Transcript collection is disabled in the configuration")
elif not conv_id:
logger.debug("Skipping transcript storage: missing conversation_id")
else:
store_transcript(
user_id=user_id,
conversation_id=conv_id,
model_id=model_id,
provider_id=provider_id,
query_is_valid=True, # TODO(lucasagomes): implement as part of query validation
query=query_request.query,
query_request=query_request,
summary=summary,
rag_chunks=[], # TODO(lucasagomes): implement rag_chunks
truncated=False, # TODO(lucasagomes): implement truncation as part
# of quota work
attachments=query_request.attachments or [],
)
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query_v2.py around lines 284 to 301, the code
always calls store_transcript even when conv_id may be empty which can create
invalid storage paths; modify the flow to check that conv_id is truthy before
calling store_transcript and skip (and optionally log a debug/warn) when conv_id
is missing so transcripts are not stored for anonymous/unsaved conversations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (13)
src/utils/suid.py (1)

37-49: Don't coerce opaque resp- IDs into UUIDs; the zero-padding path is incorrect and unsafe.*

Response API IDs are opaque identifiers—they may not follow UUID format at all. The padding logic (lines 42-44) is fundamentally flawed:

  1. Accepts invalid IDs: Padding arbitrary strings (e.g., resp-abcresp-abc + 33 zeros) to UUID length and validating them as UUIDs accepts malformed IDs.
  2. May reject valid IDs: If the actual Response API ID format differs from UUID, this will incorrectly reject them.
  3. Security risk: Accepting modified/padded IDs could lead to authentication or authorization bypasses if these IDs are used for access control.
  4. Logic error: Padding user input to make it "valid" is wrong—truncated IDs should be rejected, not transformed.

Treat Response API IDs as opaque: if the prefix is resp- or resp_ and the token is non-empty with a sane character set, accept it. Do not pad or validate as UUID.

Apply this diff to fix the validation:

-    # Handle Responses API IDs
-    if suid.startswith("resp-") or suid.startswith("resp_"):
+    # Handle Responses API IDs – treat as opaque, minimally validated
+    if suid.startswith(("resp-", "resp_")):
         token = suid[5:]
-        if not token:
-            return False
-        # If truncated (e.g., shell cut reduced length), pad to canonical UUID length
-        if len(token) < 36:
-            token = token + ("0" * (36 - len(token)))
-        try:
-            uuid.UUID(token)
-            return True
-        except (ValueError, TypeError):
-            return False
+        # Require a non-empty token with sane charset
+        return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token)
src/app/endpoints/streaming_query_v2.py (8)

118-118: BUG: evaluate_model_hints ignores validated conversation context.

You pass user_conversation=None instead of the resolved and validated user_conversation variable from lines 93-97. This breaks model/provider resolution from conversation history.

Apply this diff:

-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),

169-174: BUG: Missing required topic_summary parameter.

The function persist_user_conversation_details requires a topic_summary parameter (see src/app/endpoints/query.py lines 106-138), but the call omits it.

Apply this diff:

                     persist_user_conversation_details(
                         user_id=user_id,
                         conversation_id=conv_id,
                         model=model_id,
                         provider_id=provider_id,
+                        topic_summary=None,
                     )

282-282: BUG: stream_end_event called with wrong arity.

stream_end_event expects (metadata_map, summary, token_usage) arguments (see src/app/endpoints/streaming_query.py lines 161-217), but you're only passing metadata_map.

Apply this diff:

-            yield stream_end_event(metadata_map)
+            yield stream_end_event(metadata_map, summary, TokenCounter())

Add the missing import at the top:

+from utils.token_counter import TokenCounter

284-300: Avoid storing transcripts when conversation_id is empty.

If conv_id is not set (still empty string from line 153), calling store_transcript may create invalid storage paths or fail.

Apply this diff:

             if not is_transcripts_enabled():
                 logger.debug("Transcript collection is disabled in the configuration")
+            elif not conv_id:
+                logger.debug("Skipping transcript storage: missing conversation_id")
             else:
                 store_transcript(

308-308: SSE response is missing correct content type and streaming headers.

Without text/event-stream media type and no-cache headers, many clients will not process events correctly and may buffer the stream.

Apply this diff:

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

379-379: BUG: Call configuration.mcp_servers() accessor method.

mcp_servers is a method, not an attribute. Attribute access will fail at runtime.

Apply this diff:

-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)

388-396: Gate server-side persistence on transcripts setting.

Setting store=True unconditionally persists data server-side in Llama Stack regardless of the local transcripts_enabled setting. Align with the transcript policy.

Apply this diff:

     response = await client.responses.create(
         input=query_request.query,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
         tools=(cast(Any, tools) if tools else cast(Any, None)),
         stream=True,
-        store=True,
+        store=is_transcripts_enabled(),
     )

89-89: Avoid logging the entire Llama Stack configuration (may expose secrets).

Logging configuration.llama_stack_configuration at INFO level may expose sensitive fields such as API keys or internal URLs. Log only non-sensitive fields or use DEBUG level.

Apply this diff:

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    logger.debug(
+        "Llama Stack configured (mode=%s)",
+        "library" if configuration.llama_stack_configuration.use_as_library_client else "service",
+    )
src/app/endpoints/query_v2.py (4)

78-78: Avoid logging the entire Llama Stack configuration (may expose secrets).

Logging configuration.llama_stack_configuration at INFO level may expose sensitive fields such as API keys or internal URLs. Log only non-sensitive fields or use DEBUG level.

Apply this diff:

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    logger.debug(
+        "Llama Stack configured (mode=%s)",
+        "library" if configuration.llama_stack_configuration.use_as_library_client else "service",
+    )

82-86: Validate conversation ownership when conversation_id is provided.

The TODO comment indicates missing implementation. Without validation, users can chain to someone else's conversation via previous_response_id, leaking context across users.

Apply this diff:

     user_conversation: UserConversation | None = None
     if query_request.conversation_id:
-        # TODO: Implement conversation once Llama Stack supports its API
-        pass
+        user_conversation = validate_conversation_ownership(
+            user_id=user_id,
+            conversation_id=query_request.conversation_id,
+            others_allowed=(
+                Action.QUERY_OTHERS_CONVERSATIONS in request.state.authorized_actions
+            ),
+        )
+        if user_conversation is None:
+            raise HTTPException(
+                status_code=status.HTTP_404_NOT_FOUND,
+                detail={
+                    "response": "Conversation not found",
+                    "cause": "The requested conversation does not exist.",
+                },
+            )

Add the missing import:

 from app.endpoints.query import (
     evaluate_model_hints,
     is_transcripts_enabled,
     persist_user_conversation_details,
     query_response,
     select_model_and_provider_id,
     validate_attachments_metadata,
     get_topic_summary,
+    validate_conversation_ownership,
 )

207-207: BUG: Call configuration.mcp_servers() accessor method.

mcp_servers is a method, not an attribute. Attribute access will fail at runtime.

Apply this diff:

-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)

217-225: Gate server-side persistence on transcripts setting.

Setting store=True unconditionally persists data server-side in Llama Stack regardless of the local transcripts_enabled setting. Align with the transcript policy.

Apply this diff:

     response = await client.responses.create(
         input=query_request.query,
         model=model_id,
         instructions=system_prompt,
         previous_response_id=query_request.conversation_id,
         tools=(cast(Any, tools) if tools else cast(Any, None)),
         stream=False,
-        store=True,
+        store=is_transcripts_enabled(),
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d660592 and afe99e3.

📒 Files selected for processing (9)
  • src/app/endpoints/query.py (2 hunks)
  • src/app/endpoints/query_v2.py (1 hunks)
  • src/app/endpoints/streaming_query.py (1 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/routers.py (3 hunks)
  • src/utils/suid.py (2 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
  • tests/unit/app/test_routers.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/endpoints/test_query_v2.py
  • src/app/endpoints/streaming_query.py
  • tests/unit/app/test_routers.py
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/streaming_query_v2.py
  • src/utils/suid.py
  • src/app/endpoints/query_v2.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/streaming_query_v2.py
  • src/utils/suid.py
  • src/app/endpoints/query_v2.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/query.py
  • src/app/routers.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query_v2.py
🧬 Code graph analysis (3)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (37-52)
src/app/endpoints/streaming_query_v2.py (8)
src/app/endpoints/query.py (5)
  • evaluate_model_hints (142-181)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • select_model_and_provider_id (419-497)
  • validate_attachments_metadata (767-796)
src/utils/endpoints.py (3)
  • check_configuration_loaded (111-123)
  • get_system_prompt (126-190)
  • validate_model_provider_override (207-227)
src/app/endpoints/query_v2.py (2)
  • get_rag_tools (329-340)
  • get_mcp_tools (343-359)
src/app/endpoints/streaming_query.py (3)
  • format_stream_data (124-135)
  • stream_start_event (138-159)
  • stream_end_event (162-218)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
src/app/endpoints/query_v2.py (9)
src/app/endpoints/query.py (6)
  • evaluate_model_hints (142-181)
  • is_transcripts_enabled (98-104)
  • persist_user_conversation_details (107-139)
  • select_model_and_provider_id (419-497)
  • validate_attachments_metadata (767-796)
  • get_topic_summary (184-214)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/configuration.py (3)
  • configuration (73-77)
  • llama_stack_configuration (87-91)
  • mcp_servers (101-105)
src/app/database.py (1)
  • get_session (34-40)
src/models/responses.py (1)
  • QueryResponse (194-305)
src/utils/endpoints.py (3)
  • check_configuration_loaded (111-123)
  • get_system_prompt (126-190)
  • validate_model_provider_override (207-227)
src/utils/mcp_headers.py (1)
  • mcp_headers_dependency (15-26)
src/utils/transcripts.py (1)
  • store_transcript (40-99)
src/utils/types.py (2)
  • TurnSummary (89-163)
  • ToolCallSummary (73-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (4)
src/app/routers.py (2)

16-24: LGTM: V2 endpoint imports are properly structured.

The imports for the new v2 endpoints are correctly added and well-documented with inline comments.


46-51: Clarify versioning: conversations_v2 under /v2 but conversations under /v3.

Lines 46 and 51 show conversations_v2.router registered under /v2 and conversations.router registered under /v3. This naming is potentially confusing—consider renaming the module to match the route version (e.g., conversations_v3.router if it should be v3) or documenting why the non-versioned module is registered under /v3.

src/app/endpoints/query.py (2)

701-706: LGTM: Refactor from vector_dbs to vector_stores is correct.

The change from vector_dbs.list() to vector_stores.list() and the corresponding variable rename to vector_store_ids is consistent with the updated terminology.


799-825: LGTM: Function signature and documentation updated consistently.

The parameter rename from vector_db_ids to vector_store_ids is applied consistently throughout the function signature, docstring, and implementation.

Comment on lines +21 to +24
Check if given string is a proper session ID or response ID.
Returns True if the string is a valid UUID, False otherwise.
Returns True if the string is a valid UUID or if it starts with resp-/resp_
and it follows a valid UUID string, False otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update docstring to reflect opaque ID validation.

Once the padding logic is removed, update the docstring to accurately describe that Response API IDs are treated as opaque strings with minimal validation, not coerced into UUID format.

Apply this diff after fixing the validation logic:

-    Check if given string is a proper session ID or response ID.
+    Check if given string is a proper session ID or response ID.
 
-    Returns True if the string is a valid UUID or if it starts with resp-/resp_
-    and it follows a valid UUID string, False otherwise.
+    Returns True if the string is a valid UUID (session ID) or if it is a
+    Response API ID (starts with resp-/resp_ and has a non-empty token),
+    False otherwise.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Check if given string is a proper session ID or response ID.
Returns True if the string is a valid UUID, False otherwise.
Returns True if the string is a valid UUID or if it starts with resp-/resp_
and it follows a valid UUID string, False otherwise.
Check if given string is a proper session ID or response ID.
Returns True if the string is a valid UUID (session ID) or if it is a
Response API ID (starts with resp-/resp_ and has a non-empty token),
False otherwise.
🤖 Prompt for AI Agents
In src/utils/suid.py around lines 21 to 24, the docstring and validation imply
Response API IDs are coerced/validated as UUIDs (with padding logic) but padding
was removed and these IDs must be treated as opaque with minimal checks; update
the docstring to say the function returns True for valid UUID strings or for
response IDs that begin with "resp-" or "resp_" and are treated as opaque (no
UUID coercion), and change the validation code to stop attempting to pad/convert
the suffix to a UUID — instead only verify the prefix and that the remaining
suffix is a non-empty string (or matches a simple allowed-char check), returning
False otherwise.

@luis5tb
Copy link
Contributor Author

luis5tb commented Nov 4, 2025

Closing this PR in favor of the work in #753 (non-streaming) and #754 (streaming)

@luis5tb luis5tb closed this Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants