-
Notifications
You must be signed in to change notification settings - Fork 49
LCORE-178: add token usage tracking to LLM responses #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThreads per-turn token usage through query and streaming flows: adds Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant QueryEP as Query Endpoint
participant Retriever as retrieve_response
participant TokenUtil as extract_and_update_token_metrics
participant LLM as LLM
participant Resp as QueryResponse
Client->>QueryEP: POST /query
QueryEP->>Retriever: retrieve_response(...)
Retriever->>LLM: LLM request
LLM-->>Retriever: turn + content
Retriever->>TokenUtil: extract_and_update_token_metrics(turn, model, provider)
TokenUtil-->>Retriever: TokenCounter
Retriever-->>QueryEP: (summary, conv_id, refs, TokenCounter)
QueryEP->>Resp: build response (truncated, input_tokens, output_tokens, available_quotas)
Resp-->>Client: JSON response
sequenceDiagram
autonumber
actor Client
participant StreamEP as Streaming Endpoint
participant Generator as response_generator
participant TokenUtil as extract_token_usage_from_turn
participant SSE as stream_end_event
Client->>StreamEP: GET /streaming_query (SSE)
StreamEP->>Generator: start generator
loop per chunk/turn
Generator->>TokenUtil: extract_token_usage_from_turn(turn, system_prompt)
TokenUtil-->>Generator: TokenCounter
Generator-->>Client: data: chunk
end
Generator->>SSE: stream_end_event(metadata_map, summary, TokenCounter, media_type)
SSE-->>Client: data: end event (truncated=false, input_tokens, output_tokens)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used📓 Path-based instructions (7)src/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/app/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/{app/**/*.py,client.py}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/app/endpoints/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/{unit,integration}/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧬 Code graph analysis (4)src/app/endpoints/streaming_query.py (2)
tests/unit/app/endpoints/test_streaming_query.py (3)
tests/unit/app/endpoints/test_query.py (2)
src/app/endpoints/query.py (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/app/endpoints/query.py(7 hunks)src/app/endpoints/streaming_query.py(7 hunks)src/models/responses.py(3 hunks)src/utils/token_counter.py(1 hunks)tests/unit/app/endpoints/test_query.py(20 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/models/responses.pysrc/app/endpoints/streaming_query.pysrc/utils/token_counter.pysrc/app/endpoints/query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/models/responses.pysrc/app/endpoints/streaming_query.pysrc/utils/token_counter.pysrc/app/endpoints/query.pytests/unit/app/endpoints/test_query.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query.py (1)
src/utils/token_counter.py (2)
extract_token_usage_from_turn(38-57)TokenCounter(12-35)
src/app/endpoints/query.py (4)
src/utils/token_counter.py (2)
extract_token_usage_from_turn(38-57)TokenCounter(12-35)src/app/endpoints/streaming_query.py (1)
retrieve_response(784-895)src/utils/types.py (1)
TurnSummary(89-163)src/models/responses.py (1)
ReferencedDocument(104-116)
tests/unit/app/endpoints/test_query.py (3)
src/utils/token_counter.py (1)
TokenCounter(12-35)src/app/endpoints/query.py (1)
retrieve_response(585-735)src/app/endpoints/streaming_query.py (1)
retrieve_response(784-895)
🪛 GitHub Actions: Python linter
src/utils/token_counter.py
[error] 38-38: pylint: Unused argument 'turn' (unused-argument).
[error] 38-38: pylint: Unused argument 'system_prompt' (unused-argument).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/app/endpoints/query.py(7 hunks)src/app/endpoints/streaming_query.py(7 hunks)src/models/responses.py(3 hunks)src/utils/token_counter.py(1 hunks)tests/unit/app/endpoints/test_query.py(20 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.pysrc/utils/token_counter.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.pytests/unit/app/endpoints/test_query.pysrc/utils/token_counter.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query.py (2)
src/utils/token_counter.py (2)
extract_token_usage_from_turn(38-57)TokenCounter(12-35)src/utils/types.py (1)
TurnSummary(89-163)
src/app/endpoints/query.py (1)
src/utils/token_counter.py (2)
extract_token_usage_from_turn(38-57)TokenCounter(12-35)
tests/unit/app/endpoints/test_query.py (2)
src/utils/token_counter.py (1)
TokenCounter(12-35)src/app/endpoints/query.py (1)
retrieve_response(585-735)
🪛 GitHub Actions: Black
src/utils/token_counter.py
[error] 1-1: Black formatting check failed. 1 file would be reformatted by black: /home/runner/work/lightspeed-stack/lightspeed-stack/src/utils/token_counter.py. Run 'black .' to fix formatting.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (8)
src/app/endpoints/streaming_query.py (3)
61-61: LGTM!The import of token tracking utilities and the updated SSE example data correctly reflect the new token usage fields in the streaming response.
Also applies to: 79-80
148-150: LGTM!The
stream_end_eventsignature correctly accepts thetoken_usageparameter, and the function properly extractsinput_tokensandoutput_tokensfrom it for the SSE end event.Also applies to: 187-189
678-678: LGTM!Token usage tracking is properly integrated into the streaming flow: initialized with defaults, extracted when the turn completes, and passed to the end event.
Also applies to: 696-699, 710-710
tests/unit/app/endpoints/test_query.py (1)
41-41: LGTM!All test updates correctly accommodate the expanded
retrieve_responsereturn type (now a 4-tuple includingTokenCounter). Mock return values and unpacking statements are consistently updated across all test cases.Also applies to: 199-199, 496-498, 532-534, 569-571, 612-614, 666-668, 723-725, 782-784, 843-845, 902-904, 1079-1081, 1153-1155, 1229-1235, 1318-1320, 1403-1408, 1464-1464, 1520-1520, 1580-1582, 1635-1637
src/utils/token_counter.py (1)
48-57: Stub implementation returns hardcoded token estimates.As noted in the PR description,
extract_token_usage_from_turncurrently returns placeholder values (100 input tokens, 50 output tokens, 1 LLM call) rather than computing actual token usage from the turn data. This is intentional to provide a working foundation, but ensure follow-up work is tracked to implement actual token counting logic.The hardcoded estimates may not reflect real usage patterns. Consider whether these defaults should be adjusted based on typical query sizes, or if they should be set to 0 to more clearly indicate that real counting is not yet implemented.
src/app/endpoints/query.py (3)
58-58: LGTM!The import, return type annotation, and docstring correctly reflect the addition of
TokenCounterto theretrieve_responsereturn value.Also applies to: 593-593, 617-619
718-720: LGTM!Token usage is correctly extracted from the turn response and included in the return tuple.
Also applies to: 735-735
283-292: LGTM!The endpoint handler correctly unpacks the 4-tuple from
retrieve_responseand populates theQueryResponsewith token usage data. The TODOs fortruncatedandavailable_quotasare appropriately noted for future implementation.Also applies to: 377-380
src/utils/token_counter.py
Outdated
| def extract_token_usage_from_turn(_turn: Turn, _system_prompt: str = "") -> TokenCounter: | ||
| """Extract token usage information from a turn. | ||
| Args: | ||
| turn: The turn object containing token usage information | ||
| system_prompt: The system prompt used for the turn | ||
| Returns: | ||
| TokenCounter: Token usage information | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix docstring parameter names to match implementation.
The function parameters are prefixed with _ (_turn, _system_prompt) to indicate they're intentionally unused in the stub implementation, but the docstring still refers to them by their unprefixed names (turn, system_prompt). Update the docstring to match the actual parameter names for consistency.
Apply this diff to fix the docstring:
"""Extract token usage information from a turn.
Args:
- turn: The turn object containing token usage information
- system_prompt: The system prompt used for the turn
+ _turn: The turn object containing token usage information (unused in stub)
+ _system_prompt: The system prompt used for the turn (unused in stub)
Returns:
TokenCounter: Token usage information
"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def extract_token_usage_from_turn(_turn: Turn, _system_prompt: str = "") -> TokenCounter: | |
| """Extract token usage information from a turn. | |
| Args: | |
| turn: The turn object containing token usage information | |
| system_prompt: The system prompt used for the turn | |
| Returns: | |
| TokenCounter: Token usage information | |
| """ | |
| def extract_token_usage_from_turn(_turn: Turn, _system_prompt: str = "") -> TokenCounter: | |
| """Extract token usage information from a turn. | |
| Args: | |
| _turn: The turn object containing token usage information (unused in stub) | |
| _system_prompt: The system prompt used for the turn (unused in stub) | |
| Returns: | |
| TokenCounter: Token usage information | |
| """ |
🤖 Prompt for AI Agents
In src/utils/token_counter.py around lines 38 to 47, the docstring parameter
names don't match the function signature (_turn and _system_prompt); update the
docstring so the Args section uses the actual parameter names (_turn and
_system_prompt) and adjust any descriptions if needed to maintain clarity and
consistency with the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)
762-764: Avoid double-counting llm_calls_total
The llm_calls_total counter is already incremented inside extract_and_update_token_metrics (src/utils/token_counter.py:122). Remove the extra inc() calls in:
- src/app/endpoints/query.py:293
- src/app/endpoints/streaming_query.py:763
If you’d prefer counting at request start instead, remove the internal increment and keep a single inc() here—ensure consistency across both endpoints.
🧹 Nitpick comments (8)
src/utils/token_counter.py (1)
64-71: Safer construction of output message for encodingCasting output_message to RawMessage can break if the types diverge. Prefer constructing a RawMessage defensively.
Example:
- raw_message = cast(RawMessage, turn.output_message) + om = turn.output_message + raw_message = RawMessage( + role=getattr(om, "role", "assistant"), + content=getattr(om, "content", ""), + )This avoids relying on structural compatibility. If content is interleaved items, you can adapt later to stringify or convert as needed.
tests/unit/app/endpoints/test_streaming_query.py (1)
52-57: Prefer shared MOCK_AUTH constant across testsGuidelines suggest using a shared MOCK_AUTH. Local duplicates risk divergence.
Extract a shared MOCK_AUTH in a common test helper and import it here (and in test_query.py) for consistency.
tests/unit/app/endpoints/test_query.py (2)
41-49: Use shared MOCK_AUTH constantMirror the streaming tests suggestion: centralize MOCK_AUTH to a shared helper to reduce duplication.
569-576: Nit: strengthen assertions to cover token usage propagationSince retrieve_response now returns a TokenCounter, add minimal assertions that a TokenCounter instance is returned (even if mocked) to ensure unpacking remains correct across changes.
Example:
summary, conversation_id, _, token_usage = await retrieve_response(...) assert isinstance(token_usage, TokenCounter)Also applies to: 612-625, 666-679, 723-736, 782-807, 843-861, 902-924, 1079-1120, 1153-1179, 1229-1279, 1318-1333, 1580-1597, 1635-1662
src/models/responses.py (1)
147-151: Use default_factory for list defaultsPrefer default_factory=list to avoid mutable default pitfalls (even with Pydantic handling).
- rag_chunks: list[RAGChunk] = Field( - [], + rag_chunks: list[RAGChunk] = Field( + default_factory=list, description="List of RAG chunks used to generate the response", )src/app/endpoints/streaming_query.py (3)
62-62: Use module logger per guidelinesUse logger = logging.getLogger(name) for consistency.
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)As per coding guidelines.
73-81: SSE example: place available_quotas inside "data" for consistencyCurrent example shows "available_quotas" sibling to "data". Align structure by nesting under "data".
- 'data: {"event": "end", "data": {"referenced_documents": [], ' - '"truncated": false, "input_tokens": 150, "output_tokens": 50}, ' - '"available_quotas": {}}\n\n' + 'data: {"event": "end", "data": {"referenced_documents": [], ' + '"truncated": false, "input_tokens": 150, "output_tokens": 50, ' + '"available_quotas": {}}}\n\n'
147-164: Nest available_quotas under 'data' and complete docstring ArgsKeep all response fields inside "data" and document all parameters.
-def stream_end_event( - metadata_map: dict, summary: TurnSummary, token_usage: TokenCounter -) -> str: +def stream_end_event( + metadata_map: dict, summary: TurnSummary, token_usage: TokenCounter +) -> str: """ Yield the end of the data stream. @@ - Parameters: - metadata_map (dict): A mapping containing metadata about - referenced documents. + Args: + metadata_map (dict): Mapping of referenced document metadata. + summary (TurnSummary): Final turn summary. + token_usage (TokenCounter): Count of input/output tokens and call count. @@ - "available_quotas": {}, # TODO(jboos): implement available quotas + "available_quotas": {}, # TODO(jboos): implement available quotas } )Also applies to: 180-191
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/app/endpoints/query.py(7 hunks)src/app/endpoints/streaming_query.py(7 hunks)src/models/responses.py(3 hunks)src/utils/token_counter.py(1 hunks)tests/unit/app/endpoints/test_query.py(21 hunks)tests/unit/app/endpoints/test_streaming_query.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.pysrc/utils/token_counter.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/query.pysrc/models/responses.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/utils/token_counter.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/query.pysrc/models/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/responses.py
🧬 Code graph analysis (4)
tests/unit/app/endpoints/test_query.py (3)
src/utils/token_counter.py (1)
TokenCounter(18-41)src/app/endpoints/query.py (1)
retrieve_response(584-733)src/app/endpoints/streaming_query.py (1)
retrieve_response(780-891)
tests/unit/app/endpoints/test_streaming_query.py (1)
src/utils/token_counter.py (1)
TokenCounter(18-41)
src/app/endpoints/streaming_query.py (1)
src/utils/token_counter.py (2)
extract_and_update_token_metrics(94-127)TokenCounter(18-41)
src/app/endpoints/query.py (3)
src/utils/token_counter.py (2)
extract_and_update_token_metrics(94-127)TokenCounter(18-41)src/utils/types.py (1)
TurnSummary(89-163)src/models/responses.py (1)
ReferencedDocument(104-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (10)
tests/unit/app/endpoints/test_streaming_query.py (1)
76-81: Good: mocks updated to new extract_and_update_token_metrics APITests now patch the correct symbol and return TokenCounter, aligning with the new flow.
Consider adding assertions that the final SSE "end" event includes the new fields (truncated, input_tokens, output_tokens, available_quotas) to lock behavior. I can draft that if helpful.
tests/unit/app/endpoints/test_query.py (1)
66-71: Good: metrics utility correctly patched to TokenCounter pathPatching extract_and_update_token_metrics avoids coupling tests to metrics and matches new return signature.
src/models/responses.py (1)
171-194: LGTM: schema extended for token usage and quotasNew fields (truncated, input_tokens, output_tokens, available_quotas) are documented and exemplified.
Also applies to: 223-227
src/app/endpoints/streaming_query.py (1)
692-696: Good: token usage computed once at turn completion and passed to end eventThe extraction and propagation pattern is sound.
Also applies to: 706-707
src/app/endpoints/query.py (6)
57-57: LGTM!The imports are correct and follow the coding guidelines for absolute imports.
376-379: LGTM!The token usage fields are correctly populated from the
token_usageobject. The TODO comments are acknowledged placeholders for future enhancements as noted in the PR objectives.
592-592: LGTM!The return type annotation correctly reflects the addition of
TokenCounterto the function's return value.
616-618: LGTM!The docstring correctly documents the new return value including token usage information, following the Google Python docstring style.
713-717: LGTM!The token extraction logic is correctly implemented. The
model_labelis properly derived frommodel_id, andextract_and_update_token_metricsis called with the correct parameters.
733-733: LGTM!The return statement correctly includes
token_usageas the fourth element, consistent with the updated return type annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
59ad2f1 to
3350a5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/streaming_query.py (2)
1-61: Add missing TokenCounter import.The
TokenCountertype is used in thestream_end_eventsignature (line 160) but is never imported, causing pipeline failures across multiple checks (Ruff, Pyright, unit tests, type checks, and linter).Add the import at the top of the file:
from utils.types import TurnSummary +from utils.token_counter import TokenCounter
821-821: Pass requiredsummaryandtoken_usagearguments.The call to
stream_end_eventis missing the newly added required parameterssummaryandtoken_usage, causing pipeline failures. These values are available in theresponse_generatorscope and should be passed to populate the end event correctly.Apply this diff to fix the call:
- yield stream_end_event(metadata_map, media_type) + yield stream_end_event(metadata_map, summary, token_usage, media_type)However, there's a problem:
token_usageis not currently being extracted or tracked in the streaming flow. You'll need to add token extraction similar to how it's done in the non-streaming endpoint. Consider adding this before the end event:# Extract token usage from the turn system_prompt = get_system_prompt(query_request, configuration) token_usage = extract_token_usage_from_turn(p.turn, system_prompt)You'll also need to import
extract_token_usage_from_turnfromutils.token_counter.
🧹 Nitpick comments (1)
src/utils/token_counter.py (1)
17-41: Consider making TokenCounter immutable.The
TokenCounterdataclass is well-defined with clear documentation. However, consider addingfrozen=Trueto prevent accidental mutation after creation, which would make the token counts more reliable:-@dataclass +@dataclass(frozen=True) class TokenCounter:This is optional but improves safety when passing token counters between functions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/app/endpoints/query.py(7 hunks)src/app/endpoints/streaming_query.py(2 hunks)src/models/responses.py(3 hunks)src/utils/token_counter.py(1 hunks)tests/unit/app/endpoints/test_query.py(22 hunks)tests/unit/app/endpoints/test_streaming_query.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/app/endpoints/test_streaming_query.py
🧰 Additional context used
📓 Path-based instructions (9)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
tests/unit/app/endpoints/test_query.pysrc/models/responses.pysrc/utils/token_counter.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/models/responses.pysrc/utils/token_counter.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
🧬 Code graph analysis (3)
tests/unit/app/endpoints/test_query.py (2)
src/utils/token_counter.py (1)
TokenCounter(18-41)src/app/endpoints/query.py (1)
retrieve_response(582-731)
src/app/endpoints/streaming_query.py (2)
src/utils/types.py (1)
TurnSummary(89-163)src/utils/token_counter.py (1)
TokenCounter(18-41)
src/app/endpoints/query.py (1)
src/utils/token_counter.py (2)
extract_and_update_token_metrics(97-130)TokenCounter(18-41)
🪛 GitHub Actions: Pyright
src/app/endpoints/streaming_query.py
[error] 160-160: pyright: 'TokenCounter' is not defined.
[error] 821-821: pyright: Argument missing for parameter 'token_usage'.
🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py
[error] 160-160: Undefined variable 'TokenCounter'.
[warning] 159-159: Unused argument 'summary' (unused-argument).
[error] 821-821: No value for argument 'token_usage' in function call.
🪛 GitHub Actions: Ruff
src/app/endpoints/streaming_query.py
[error] 160-160: Ruff: Undefined name 'TokenCounter' (F821) in src/app/endpoints/streaming_query.py:160.
🪛 GitHub Actions: Type checks
src/app/endpoints/streaming_query.py
[error] 160-160: Mypy error during 'uv run mypy ...': Name 'TokenCounter' is not defined.
[error] 821-821: Mypy error during 'uv run mypy ...': Missing positional argument 'token_usage' in call to 'stream_end_event'.
🪛 GitHub Actions: Unit tests
tests/unit/app/endpoints/test_query.py
[error] 37-37: NameError: name 'TokenCounter' is not defined
src/models/responses.py
[warning] 125-125: Pydantic: Using extra keyword arguments on Field is deprecated and will be removed in a future version.
src/app/endpoints/streaming_query.py
[error] 160-160: NameError: name 'TokenCounter' is not defined.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (10)
src/app/endpoints/streaming_query.py (1)
157-162: Unusedsummaryparameter is expected.The linter warns about the unused
summaryparameter, but this is intentional. Line 205 has a TODO indicating thatsummarywill be used for implementing RAG chunks in the end event. The parameter is included now to establish the correct signature for future enhancement.src/models/responses.py (1)
231-253: LGTM! Clean data model extension.The new fields (
truncated,input_tokens,output_tokens,available_quotas) are well-defined with appropriate defaults, clear descriptions, and helpful examples. The docstring and model examples are properly updated to reflect these additions.tests/unit/app/endpoints/test_query.py (2)
41-41: LGTM! Proper test mocking for token metrics.The import and
mock_metricshelper are correctly updated to mockextract_and_update_token_metricsand return aTokenCounter()instance. The helper also appropriately mocks the internal Prometheus metrics to ensure test isolation.Also applies to: 65-75
498-500: LGTM! Consistent 4-tuple unpacking throughout tests.All test functions are properly updated to handle the new return signature from
retrieve_response, which now returns(summary, conversation_id, referenced_documents, TokenCounter). The unpacking patterns are used consistently, and tests that don't need the token usage simply use_placeholders.Also applies to: 534-536, 571-573, 614-616, 668-670, 725-727, 784-786, 845-847, 904-906, 1081-1083, 1155-1157, 1231-1237, 1320-1322, 1582-1584, 1637-1639
src/utils/token_counter.py (2)
44-94: LGTM! Solid token counting implementation.The function correctly extracts token usage using the same tokenizer as the metrics system, ensuring consistency. The error handling with fallback default estimates (100 input, 50 output tokens) provides resilience when token counting fails. The system prompt is properly included with
role="system"and theinput_tokens_countedfield is populated.
97-130: LGTM! Clean integration of extraction and metrics.The function properly combines token extraction with Prometheus metrics updates, ensuring both the API response and monitoring stay in sync. Error handling for metric updates is appropriate—failures are logged but don't prevent returning the token counts to the caller.
src/app/endpoints/query.py (4)
57-57: LGTM! Correct import for token tracking.The import properly brings in
extract_and_update_token_metricsandTokenCounterfrom the newutils.token_countermodule, replacing the previous metrics-only approach.
282-291: LGTM! Proper unpacking of token usage.The call to
retrieve_responsecorrectly unpacks the new 4-tuple return value, capturingtoken_usagefor use in building the final response and storing transcripts.
368-378: LGTM! Token usage properly integrated into response.The
QueryResponseconstruction correctly usestoken_usage.input_tokensandtoken_usage.output_tokensfrom the extracted token counter. The TODOs fortruncatedandavailable_quotasappropriately mark future enhancements while providing sensible defaults now.
590-590: LGTM! Complete token usage integration in retrieve_response.The function signature, docstring, implementation, and return statement are all correctly updated to extract and return token usage. The call to
extract_and_update_token_metricsproperly passes the turn, model label, provider ID, and system prompt, ensuring accurate token counting and metrics updates.Also applies to: 613-616, 711-715, 731-731
e0fc0f5 to
647a15c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unit/app/endpoints/test_query.py (1)
65-74: Consider removing redundant metric mocks.Since
extract_and_update_token_metricsis already mocked to returnTokenCounter(), the individual metric mocks at lines 72-74 are unnecessary—the mocked function won't execute its internal metric updates. These lines can be safely removed to simplify the test helper.Apply this diff to remove the redundant mocks:
def mock_metrics(mocker): """Helper function to mock metrics operations for query endpoints.""" mocker.patch( "app.endpoints.query.extract_and_update_token_metrics", return_value=TokenCounter(), ) - # Mock the metrics that are called inside extract_and_update_token_metrics - mocker.patch("metrics.llm_token_sent_total") - mocker.patch("metrics.llm_token_received_total") - mocker.patch("metrics.llm_calls_total")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/app/endpoints/query.py(7 hunks)src/app/endpoints/streaming_query.py(4 hunks)src/models/responses.py(3 hunks)src/utils/token_counter.py(1 hunks)tests/unit/app/endpoints/test_query.py(22 hunks)tests/unit/app/endpoints/test_streaming_query.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/unit/app/endpoints/test_streaming_query.py
- src/models/responses.py
- src/utils/token_counter.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pytests/unit/app/endpoints/test_query.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query.py
🧬 Code graph analysis (3)
src/app/endpoints/query.py (1)
src/utils/token_counter.py (2)
extract_and_update_token_metrics(97-130)TokenCounter(18-41)
src/app/endpoints/streaming_query.py (2)
src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (1)
TurnSummary(89-163)
tests/unit/app/endpoints/test_query.py (2)
src/utils/token_counter.py (1)
TokenCounter(18-41)src/app/endpoints/query.py (1)
retrieve_response(582-731)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (9)
tests/unit/app/endpoints/test_query.py (2)
41-41: LGTM: Import required for test mocking.The TokenCounter import is necessary for creating instances in test mocks and follows the absolute import guidelines.
202-202: LGTM: Test cases correctly updated for new return signature.All test cases have been consistently updated to handle the new 4-tuple return value from
retrieve_response, correctly unpacking the token usage as the 4th element. Mocks properly returnTokenCounter()instances.Also applies to: 498-500, 534-536, 571-573, 614-616, 668-670, 725-727, 784-786, 845-847, 904-906, 1081-1083, 1155-1157, 1230-1237, 1320-1322, 1405-1410, 1466-1466, 1522-1522, 1582-1584, 1637-1639
src/app/endpoints/query.py (5)
57-57: LGTM: Import statements correct.The imports of
extract_and_update_token_metricsandTokenCounterfollow absolute import guidelines and are used appropriately in the code.
282-291: LGTM: Correct unpacking of extended return value.The unpacking of the 4-tuple from
retrieve_responsecorrectly captures the newtoken_usagefield, which is subsequently used to populate token fields in the response.
368-378: LGTM: QueryResponse correctly populated with token fields.The new fields
input_tokensandoutput_tokensare correctly populated fromtoken_usage. The TODOs fortruncatedandavailable_quotasappropriately indicate future work scope.
590-590: LGTM: Function signature and docstring properly updated.The return type annotation and docstring correctly describe the extended 4-tuple return value including token usage information. As per coding guidelines, the docstring follows Google Python style.
Also applies to: 614-616
711-715: LGTM: Token usage extraction correctly integrated.The call to
extract_and_update_token_metricsproperly extracts token usage and updates Prometheus metrics in a single operation. The model label extraction logic correctly handles both "provider/model" and plain "model" formats. The return statement correctly includes the token usage as the 4th element.Also applies to: 731-731
src/app/endpoints/streaming_query.py (2)
158-163: Unused parameter is acceptable for future implementation.The
summaryparameter is currently unused (hence the pylint disable comment), but this is justified given the TODO comment at line 206 indicating it will be used for RAG chunks implementation. This approach maintains a consistent function signature in preparation for that enhancement.
209-210: LGTM! Token usage correctly integrated into end event.The end event payload now uses
token_usage.input_tokensandtoken_usage.output_tokensto populate the response, replacing the previous placeholder values. The implementation is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/app/endpoints/query.py(7 hunks)src/app/endpoints/streaming_query.py(4 hunks)src/models/responses.py(3 hunks)src/utils/token_counter.py(1 hunks)tests/unit/app/endpoints/test_query.py(22 hunks)tests/unit/app/endpoints/test_streaming_query.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.pysrc/utils/token_counter.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.pytests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.pysrc/utils/token_counter.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.py
🧬 Code graph analysis (4)
src/app/endpoints/streaming_query.py (2)
src/utils/token_counter.py (2)
TokenCounter(18-41)extract_token_usage_from_turn(44-94)src/utils/types.py (1)
TurnSummary(89-163)
src/app/endpoints/query.py (3)
src/utils/token_counter.py (2)
extract_and_update_token_metrics(97-130)TokenCounter(18-41)src/utils/types.py (1)
TurnSummary(89-163)src/models/responses.py (1)
ReferencedDocument(164-176)
tests/unit/app/endpoints/test_query.py (2)
src/utils/token_counter.py (1)
TokenCounter(18-41)src/app/endpoints/query.py (1)
retrieve_response(582-731)
tests/unit/app/endpoints/test_streaming_query.py (4)
src/models/responses.py (1)
RAGChunk(148-153)src/utils/token_counter.py (1)
TokenCounter(18-41)src/utils/types.py (1)
TurnSummary(89-163)src/app/endpoints/streaming_query.py (1)
stream_end_event(158-214)
🪛 GitHub Actions: Pyright
src/app/endpoints/streaming_query.py
[error] 823-823: "p" is possibly unbound (reportPossiblyUnboundVariable)
[error] 823-823: Cannot access attribute "turn" for class "AgentTurnResponseStepCompletePayload". Attribute "turn" is unknown (reportAttributeAccessIssue)
[error] 823-823: Cannot access attribute "turn" for class "AgentTurnResponseStepStartPayload". Attribute "turn" is unknown (reportAttributeAccessIssue)
[error] 823-823: Cannot access attribute "turn" for class "AgentTurnResponseStepProgressPayload". Attribute "turn" is unknown (reportAttributeAccessIssue)
[error] 823-823: Cannot access attribute "turn" for class "AgentTurnResponseTurnStartPayload". Attribute "turn" is unknown (reportAttributeAccessIssue)
🪛 GitHub Actions: Type checks
src/app/endpoints/streaming_query.py
[error] 823-823: mypy: Item "AgentTurnResponseStepStartPayload" of a union has no attribute "turn". (union-attr)
[error] 823-823: mypy: Item "AgentTurnResponseStepProgressPayload" of a union has no attribute "turn". (union-attr)
[error] 823-823: mypy: Item "AgentTurnResponseStepCompletePayload" of a union has no attribute "turn". (union-attr)
[error] 823-823: mypy: Item "AgentTurnResponseTurnStartPayload" of a union has no attribute "turn". (union-attr)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (ci)
- Add TokenCounter dataclass to track input/output tokens and LLM calls - Update QueryResponse model with token usage fields (input_tokens, output_tokens, truncated, available_quotas) - Implement extract_token_usage_from_turn() function for token counting - Update /query and /streaming_query endpoints to include token usage in responses - Modify retrieve_response() to return token usage information - Update test cases to handle new return values and mock token usage - Maintain backward compatibility with existing API structure The implementation provides a foundation for token tracking that can be enhanced with more sophisticated counting logic in the future.
The implementation provides a foundation for token tracking that can be enhanced with more sophisticated counting logic in the future.
Description
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Streaming
Documentation
Tests