LCORE-178: add token usage tracking to LLM responses #635

gallettilance · 2025-10-07T17:35:14Z

Add TokenCounter dataclass to track input/output tokens and LLM calls
Update QueryResponse model with token usage fields (input_tokens, output_tokens, truncated, available_quotas)
Implement extract_token_usage_from_turn() function for token counting
Update /query and /streaming_query endpoints to include token usage in responses
Modify retrieve_response() to return token usage information
Update test cases to handle new return values and mock token usage
Maintain backward compatibility with existing API structure

The implementation provides a foundation for token tracking that can be enhanced with more sophisticated counting logic in the future.

Description

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Responses now include token usage metrics (input_tokens, output_tokens), a truncation flag (truncated), and available_quotas; token usage is computed and surfaced in responses, transcripts, and logs.
Streaming
- End-of-stream events now include turn summaries and real token usage counts for input/output tokens.
Documentation
- API schema and examples updated to show the new response fields.
Tests
- Unit tests updated to handle the extended response payload and token-usage flow.

coderabbitai · 2025-10-07T17:35:38Z

Walkthrough

Threads per-turn token usage through query and streaming flows: adds TokenCounter and extraction utilities, extends retrieve_response and SSE end events to return/pass token metrics, updates QueryResponse with token fields, and adapts tests and metrics plumbing to the new return signatures.

Changes

Cohort / File(s)	Summary
Endpoints: Query `src/app/endpoints/query.py`	`retrieve_response` now returns `(summary, conversation_id, referenced_documents, TokenCounter)`; imports `extract_and_update_token_metrics`; `query_endpoint_handler` unpacks token usage and populates `QueryResponse` fields `truncated`, `input_tokens`, `output_tokens`, `available_quotas`.
Endpoints: Streaming `src/app/endpoints/streaming_query.py`	`stream_end_event` signature updated to accept `summary: TurnSummary` and `token_usage: TokenCounter`; SSE end payload uses `token_usage.input_tokens` / `token_usage.output_tokens`; `response_generator` tracks latest turn and extracts token usage to pass to end event.
Models `src/models/responses.py`	`QueryResponse` extended with fields: `truncated: bool`, `input_tokens: int`, `output_tokens: int`, `available_quotas: dict[str,int]`; docstring and JSON examples updated.
Utilities: Token counting `src/utils/token_counter.py`	New module adding `@dataclass TokenCounter`, `extract_token_usage_from_turn`, and `extract_and_update_token_metrics`; counts tokens per turn and updates Prometheus metrics with safe fallbacks and logging.
Tests `tests/unit/app/endpoints/test_query.py`, `tests/unit/app/endpoints/test_streaming_query.py`	Tests updated to import/mock `TokenCounter` and `extract_and_update_token_metrics`; adapt unpacking to the new 4-tuple return; `stream_end_event` tests updated to accept `summary` and `token_usage`; metric mocks adjusted for `llm_token_sent_total`, `llm_token_received_total`, `llm_calls_total`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant QueryEP as Query Endpoint
  participant Retriever as retrieve_response
  participant TokenUtil as extract_and_update_token_metrics
  participant LLM as LLM
  participant Resp as QueryResponse

  Client->>QueryEP: POST /query
  QueryEP->>Retriever: retrieve_response(...)
  Retriever->>LLM: LLM request
  LLM-->>Retriever: turn + content
  Retriever->>TokenUtil: extract_and_update_token_metrics(turn, model, provider)
  TokenUtil-->>Retriever: TokenCounter
  Retriever-->>QueryEP: (summary, conv_id, refs, TokenCounter)
  QueryEP->>Resp: build response (truncated, input_tokens, output_tokens, available_quotas)
  Resp-->>Client: JSON response

sequenceDiagram
  autonumber
  actor Client
  participant StreamEP as Streaming Endpoint
  participant Generator as response_generator
  participant TokenUtil as extract_token_usage_from_turn
  participant SSE as stream_end_event

  Client->>StreamEP: GET /streaming_query (SSE)
  StreamEP->>Generator: start generator
  loop per chunk/turn
    Generator->>TokenUtil: extract_token_usage_from_turn(turn, system_prompt)
    TokenUtil-->>Generator: TokenCounter
    Generator-->>Client: data: chunk
  end
  Generator->>SSE: stream_end_event(metadata_map, summary, TokenCounter, media_type)
  SSE-->>Client: data: end event (truncated=false, input_tokens, output_tokens)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

LCORE-282: ensure compatibility with OLS streaming query #519 — Modifies streaming endpoint and end-event payload shapes; overlaps with stream_end_event signature and payload changes.
LCORE-411: add token usage metrics #489 — Adds per-turn token usage/metrics to query/streaming paths; closely related to replacing the prior metric helper with TokenCounter.
LCORE-693: Added rag_chunks to streaming_query #585 — Touches streaming/query flow and end-event composition; related to streaming response end-event changes.

Suggested reviewers

manstis
tisnik
umago

Poem

A hop, a tally, tokens in cheer,
I count each whisper far and near.
Inputs, outputs — carrots aligned,
Streams and queries neatly signed.
Rabbit-approved metrics — hop and bind! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title accurately reflects the primary change of the pull request by summarizing the addition of token usage tracking to LLM responses and aligns with the PR objectives without unnecessary detail.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5eca3a2 and 958c626.

📒 Files selected for processing (6)

src/app/endpoints/query.py (7 hunks)
src/app/endpoints/streaming_query.py (6 hunks)
src/models/responses.py (3 hunks)
src/utils/token_counter.py (1 hunks)
tests/unit/app/endpoints/test_query.py (22 hunks)
tests/unit/app/endpoints/test_streaming_query.py (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/models/responses.py
src/utils/token_counter.py

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/streaming_query.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/app/endpoints/test_query.py
src/app/endpoints/query.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_streaming_query.py
tests/unit/app/endpoints/test_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_streaming_query.py
tests/unit/app/endpoints/test_query.py

🧬 Code graph analysis (4)

src/app/endpoints/streaming_query.py (2)

src/utils/token_counter.py (2)

TokenCounter (18-41)

extract_token_usage_from_turn (44-94)

src/utils/types.py (1)

TurnSummary (89-163)

tests/unit/app/endpoints/test_streaming_query.py (3)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/types.py (1)

TurnSummary (89-163)

src/app/endpoints/streaming_query.py (1)

stream_end_event (158-214)

tests/unit/app/endpoints/test_query.py (2)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/app/endpoints/query.py (1)

retrieve_response (582-731)

src/app/endpoints/query.py (3)

src/utils/token_counter.py (2)

extract_and_update_token_metrics (97-130)

TokenCounter (18-41)

src/utils/types.py (1)

TurnSummary (89-163)

src/models/responses.py (1)

ReferencedDocument (164-176)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 690a6bc and 5cc6860.

📒 Files selected for processing (5)

src/app/endpoints/query.py (7 hunks)
src/app/endpoints/streaming_query.py (7 hunks)
src/models/responses.py (3 hunks)
src/utils/token_counter.py (1 hunks)
tests/unit/app/endpoints/test_query.py (20 hunks)

🧰 Additional context used

📓 Path-based instructions (9)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/models/responses.py
src/app/endpoints/streaming_query.py
src/utils/token_counter.py
src/app/endpoints/query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/models/responses.py
src/app/endpoints/streaming_query.py
src/utils/token_counter.py
src/app/endpoints/query.py
tests/unit/app/endpoints/test_query.py

src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

src/models/responses.py

src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

src/models/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query.py

🧬 Code graph analysis (3)

src/app/endpoints/streaming_query.py (1)

src/utils/token_counter.py (2)

extract_token_usage_from_turn (38-57)

TokenCounter (12-35)

src/app/endpoints/query.py (4)

src/utils/token_counter.py (2)

extract_token_usage_from_turn (38-57)

TokenCounter (12-35)

src/app/endpoints/streaming_query.py (1)

retrieve_response (784-895)

src/utils/types.py (1)

TurnSummary (89-163)

src/models/responses.py (1)

ReferencedDocument (104-116)

tests/unit/app/endpoints/test_query.py (3)

src/utils/token_counter.py (1)

TokenCounter (12-35)

src/app/endpoints/query.py (1)

retrieve_response (585-735)

src/app/endpoints/streaming_query.py (1)

retrieve_response (784-895)

🪛 GitHub Actions: Python linter

src/utils/token_counter.py

[error] 38-38: pylint: Unused argument 'turn' (unused-argument).

[error] 38-38: pylint: Unused argument 'system_prompt' (unused-argument).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)

src/utils/token_counter.py

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cc6860 and 253ddd9.

📒 Files selected for processing (5)

src/app/endpoints/query.py (7 hunks)
src/app/endpoints/streaming_query.py (7 hunks)
src/models/responses.py (3 hunks)
src/utils/token_counter.py (1 hunks)
tests/unit/app/endpoints/test_query.py (20 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/models/responses.py

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py
src/utils/token_counter.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py
tests/unit/app/endpoints/test_query.py
src/utils/token_counter.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query.py

🧬 Code graph analysis (3)

src/app/endpoints/streaming_query.py (2)

src/utils/token_counter.py (2)

extract_token_usage_from_turn (38-57)

TokenCounter (12-35)

src/utils/types.py (1)

TurnSummary (89-163)

src/app/endpoints/query.py (1)

src/utils/token_counter.py (2)

extract_token_usage_from_turn (38-57)

TokenCounter (12-35)

tests/unit/app/endpoints/test_query.py (2)

src/utils/token_counter.py (1)

TokenCounter (12-35)

src/app/endpoints/query.py (1)

retrieve_response (585-735)

🪛 GitHub Actions: Black

src/utils/token_counter.py

[error] 1-1: Black formatting check failed. 1 file would be reformatted by black: /home/runner/work/lightspeed-stack/lightspeed-stack/src/utils/token_counter.py. Run 'black .' to fix formatting.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)

🔇 Additional comments (8)

src/app/endpoints/streaming_query.py (3)

61-61: LGTM!

The import of token tracking utilities and the updated SSE example data correctly reflect the new token usage fields in the streaming response.

Also applies to: 79-80

148-150: LGTM!

The stream_end_event signature correctly accepts the token_usage parameter, and the function properly extracts input_tokens and output_tokens from it for the SSE end event.

Also applies to: 187-189

678-678: LGTM!

Token usage tracking is properly integrated into the streaming flow: initialized with defaults, extracted when the turn completes, and passed to the end event.

Also applies to: 696-699, 710-710

tests/unit/app/endpoints/test_query.py (1)

41-41: LGTM!

All test updates correctly accommodate the expanded retrieve_response return type (now a 4-tuple including TokenCounter). Mock return values and unpacking statements are consistently updated across all test cases.

Also applies to: 199-199, 496-498, 532-534, 569-571, 612-614, 666-668, 723-725, 782-784, 843-845, 902-904, 1079-1081, 1153-1155, 1229-1235, 1318-1320, 1403-1408, 1464-1464, 1520-1520, 1580-1582, 1635-1637

src/utils/token_counter.py (1)

48-57: Stub implementation returns hardcoded token estimates.

As noted in the PR description, extract_token_usage_from_turn currently returns placeholder values (100 input tokens, 50 output tokens, 1 LLM call) rather than computing actual token usage from the turn data. This is intentional to provide a working foundation, but ensure follow-up work is tracked to implement actual token counting logic.

The hardcoded estimates may not reflect real usage patterns. Consider whether these defaults should be adjusted based on typical query sizes, or if they should be set to 0 to more clearly indicate that real counting is not yet implemented.

src/app/endpoints/query.py (3)

58-58: LGTM!

The import, return type annotation, and docstring correctly reflect the addition of TokenCounter to the retrieve_response return value.

Also applies to: 593-593, 617-619

718-720: LGTM!

Token usage is correctly extracted from the turn response and included in the return tuple.

Also applies to: 735-735

283-292: LGTM!

The endpoint handler correctly unpacks the 4-tuple from retrieve_response and populates the QueryResponse with token usage data. The TODOs for truncated and available_quotas are appropriately noted for future implementation.

Also applies to: 377-380

src/utils/token_counter.py

coderabbitai · 2025-10-07T18:08:00Z

src/utils/token_counter.py

+def extract_token_usage_from_turn(_turn: Turn, _system_prompt: str = "") -> TokenCounter:
+    """Extract token usage information from a turn.
+
+    Args:
+        turn: The turn object containing token usage information
+        system_prompt: The system prompt used for the turn
+
+    Returns:
+        TokenCounter: Token usage information
+    """


⚠️ Potential issue | 🟡 Minor

Fix docstring parameter names to match implementation.

The function parameters are prefixed with _ (_turn, _system_prompt) to indicate they're intentionally unused in the stub implementation, but the docstring still refers to them by their unprefixed names (turn, system_prompt). Update the docstring to match the actual parameter names for consistency.

Apply this diff to fix the docstring:

"""Extract token usage information from a turn. Args: - turn: The turn object containing token usage information - system_prompt: The system prompt used for the turn + _turn: The turn object containing token usage information (unused in stub) + _system_prompt: The system prompt used for the turn (unused in stub) Returns: TokenCounter: Token usage information """

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def extract_token_usage_from_turn(_turn: Turn, _system_prompt: str = "") -> TokenCounter:

"""Extract token usage information from a turn.

Args:

turn: The turn object containing token usage information

system_prompt: The system prompt used for the turn

Returns:

TokenCounter: Token usage information

"""

def extract_token_usage_from_turn(_turn: Turn, _system_prompt: str = "") -> TokenCounter:

"""Extract token usage information from a turn.

Args:

_turn: The turn object containing token usage information (unused in stub)

_system_prompt: The system prompt used for the turn (unused in stub)

Returns:

TokenCounter: Token usage information

"""

🤖 Prompt for AI Agents

In src/utils/token_counter.py around lines 38 to 47, the docstring parameter names don't match the function signature (_turn and _system_prompt); update the docstring so the Args section uses the actual parameter names (_turn and _system_prompt) and adjust any descriptions if needed to maintain clarity and consistency with the implementation.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/streaming_query.py (1)

762-764: Avoid double-counting llm_calls_total
The llm_calls_total counter is already incremented inside extract_and_update_token_metrics (src/utils/token_counter.py:122). Remove the extra inc() calls in:

src/app/endpoints/query.py:293

src/app/endpoints/streaming_query.py:763

If you’d prefer counting at request start instead, remove the internal increment and keep a single inc() here—ensure consistency across both endpoints.

🧹 Nitpick comments (8)

src/utils/token_counter.py (1)
64-71: Safer construction of output message for encoding

Casting output_message to RawMessage can break if the types diverge. Prefer constructing a RawMessage defensively.

Example:
-            raw_message = cast(RawMessage, turn.output_message)
+            om = turn.output_message
+            raw_message = RawMessage(
+                role=getattr(om, "role", "assistant"),
+                content=getattr(om, "content", ""),
+            )
This avoids relying on structural compatibility. If content is interleaved items, you can adapt later to stringify or convert as needed.
tests/unit/app/endpoints/test_streaming_query.py (1)

52-57: Prefer shared MOCK_AUTH constant across tests

Guidelines suggest using a shared MOCK_AUTH. Local duplicates risk divergence.

Extract a shared MOCK_AUTH in a common test helper and import it here (and in test_query.py) for consistency.
tests/unit/app/endpoints/test_query.py (2)
41-49: Use shared MOCK_AUTH constant

Mirror the streaming tests suggestion: centralize MOCK_AUTH to a shared helper to reduce duplication.

569-576: Nit: strengthen assertions to cover token usage propagation

Since retrieve_response now returns a TokenCounter, add minimal assertions that a TokenCounter instance is returned (even if mocked) to ensure unpacking remains correct across changes.

Example:
summary, conversation_id, _, token_usage = await retrieve_response(...)
assert isinstance(token_usage, TokenCounter)
Also applies to: 612-625, 666-679, 723-736, 782-807, 843-861, 902-924, 1079-1120, 1153-1179, 1229-1279, 1318-1333, 1580-1597, 1635-1662
src/models/responses.py (1)
147-151: Use default_factory for list defaults

Prefer default_factory=list to avoid mutable default pitfalls (even with Pydantic handling).
-    rag_chunks: list[RAGChunk] = Field(
-        [],
+    rag_chunks: list[RAGChunk] = Field(
+        default_factory=list,
         description="List of RAG chunks used to generate the response",
     )
src/app/endpoints/streaming_query.py (3)
62-62: Use module logger per guidelines

Use logger = logging.getLogger(name) for consistency.
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
As per coding guidelines.

73-81: SSE example: place available_quotas inside "data" for consistency

Current example shows "available_quotas" sibling to "data". Align structure by nesting under "data".
-                        'data: {"event": "end", "data": {"referenced_documents": [], '
-                        '"truncated": false, "input_tokens": 150, "output_tokens": 50}, '
-                        '"available_quotas": {}}\n\n'
+                        'data: {"event": "end", "data": {"referenced_documents": [], '
+                        '"truncated": false, "input_tokens": 150, "output_tokens": 50, '
+                        '"available_quotas": {}}}\n\n'
147-164: Nest available_quotas under 'data' and complete docstring Args

Keep all response fields inside "data" and document all parameters.
-def stream_end_event(
-    metadata_map: dict, summary: TurnSummary, token_usage: TokenCounter
-) -> str:
+def stream_end_event(
+    metadata_map: dict, summary: TurnSummary, token_usage: TokenCounter
+) -> str:
     """
     Yield the end of the data stream.
@@
-    Parameters:
-        metadata_map (dict): A mapping containing metadata about
-        referenced documents.
+    Args:
+        metadata_map (dict): Mapping of referenced document metadata.
+        summary (TurnSummary): Final turn summary.
+        token_usage (TokenCounter): Count of input/output tokens and call count.
@@
-            "available_quotas": {},  # TODO(jboos): implement available quotas
+            "available_quotas": {},  # TODO(jboos): implement available quotas
         }
     )
Also applies to: 180-191

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 253ddd9 and fe55e3b.

📒 Files selected for processing (6)

src/app/endpoints/query.py (7 hunks)
src/app/endpoints/streaming_query.py (7 hunks)
src/models/responses.py (3 hunks)
src/utils/token_counter.py (1 hunks)
tests/unit/app/endpoints/test_query.py (21 hunks)
tests/unit/app/endpoints/test_streaming_query.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (9)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_streaming_query.py
src/utils/token_counter.py
src/app/endpoints/streaming_query.py
src/app/endpoints/query.py
src/models/responses.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_streaming_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_streaming_query.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/utils/token_counter.py
src/app/endpoints/streaming_query.py
src/app/endpoints/query.py
src/models/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

src/models/responses.py

src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

src/models/responses.py

🧬 Code graph analysis (4)

tests/unit/app/endpoints/test_query.py (3)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/app/endpoints/query.py (1)

retrieve_response (584-733)

src/app/endpoints/streaming_query.py (1)

retrieve_response (780-891)

tests/unit/app/endpoints/test_streaming_query.py (1)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/app/endpoints/streaming_query.py (1)

src/utils/token_counter.py (2)

extract_and_update_token_metrics (94-127)

TokenCounter (18-41)

src/app/endpoints/query.py (3)

src/utils/token_counter.py (2)

extract_and_update_token_metrics (94-127)

TokenCounter (18-41)

src/utils/types.py (1)

TurnSummary (89-163)

src/models/responses.py (1)

ReferencedDocument (104-116)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)

🔇 Additional comments (10)

tests/unit/app/endpoints/test_streaming_query.py (1)

76-81: Good: mocks updated to new extract_and_update_token_metrics API

Tests now patch the correct symbol and return TokenCounter, aligning with the new flow.

Consider adding assertions that the final SSE "end" event includes the new fields (truncated, input_tokens, output_tokens, available_quotas) to lock behavior. I can draft that if helpful.

tests/unit/app/endpoints/test_query.py (1)

66-71: Good: metrics utility correctly patched to TokenCounter path

Patching extract_and_update_token_metrics avoids coupling tests to metrics and matches new return signature.

src/models/responses.py (1)

171-194: LGTM: schema extended for token usage and quotas

New fields (truncated, input_tokens, output_tokens, available_quotas) are documented and exemplified.

Also applies to: 223-227

src/app/endpoints/streaming_query.py (1)

692-696: Good: token usage computed once at turn completion and passed to end event

The extraction and propagation pattern is sound.

Also applies to: 706-707

src/app/endpoints/query.py (6)

57-57: LGTM!

The imports are correct and follow the coding guidelines for absolute imports.

376-379: LGTM!

The token usage fields are correctly populated from the token_usage object. The TODO comments are acknowledged placeholders for future enhancements as noted in the PR objectives.

592-592: LGTM!

The return type annotation correctly reflects the addition of TokenCounter to the function's return value.

616-618: LGTM!

The docstring correctly documents the new return value including token usage information, following the Google Python docstring style.

713-717: LGTM!

The token extraction logic is correctly implemented. The model_label is properly derived from model_id, and extract_and_update_token_metrics is called with the correct parameters.

733-733: LGTM!

The return statement correctly includes token_usage as the fourth element, consistent with the updated return type annotation.

src/app/endpoints/query.py

src/utils/token_counter.py

tisnik

LGTM

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/app/endpoints/streaming_query.py (2)
1-61: Add missing TokenCounter import.

The TokenCounter type is used in the stream_end_event signature (line 160) but is never imported, causing pipeline failures across multiple checks (Ruff, Pyright, unit tests, type checks, and linter).

Add the import at the top of the file:
 from utils.types import TurnSummary
+from utils.token_counter import TokenCounter
821-821: Pass required summary and token_usage arguments.

The call to stream_end_event is missing the newly added required parameters summary and token_usage, causing pipeline failures. These values are available in the response_generator scope and should be passed to populate the end event correctly.

Apply this diff to fix the call:
-            yield stream_end_event(metadata_map, media_type)
+            yield stream_end_event(metadata_map, summary, token_usage, media_type)
However, there's a problem: token_usage is not currently being extracted or tracked in the streaming flow. You'll need to add token extraction similar to how it's done in the non-streaming endpoint. Consider adding this before the end event:
# Extract token usage from the turn
system_prompt = get_system_prompt(query_request, configuration)
token_usage = extract_token_usage_from_turn(p.turn, system_prompt)
You'll also need to import extract_token_usage_from_turn from utils.token_counter.

🧹 Nitpick comments (1)

src/utils/token_counter.py (1)
17-41: Consider making TokenCounter immutable.

The TokenCounter dataclass is well-defined with clear documentation. However, consider adding frozen=True to prevent accidental mutation after creation, which would make the token counts more reliable:
-@dataclass
+@dataclass(frozen=True)
 class TokenCounter:
This is optional but improves safety when passing token counters between functions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2eaa17d and 3350a5a.

📒 Files selected for processing (6)

src/app/endpoints/query.py (7 hunks)
src/app/endpoints/streaming_query.py (2 hunks)
src/models/responses.py (3 hunks)
src/utils/token_counter.py (1 hunks)
tests/unit/app/endpoints/test_query.py (22 hunks)
tests/unit/app/endpoints/test_streaming_query.py (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/app/endpoints/test_streaming_query.py

🧰 Additional context used

📓 Path-based instructions (9)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

tests/unit/app/endpoints/test_query.py
src/models/responses.py
src/utils/token_counter.py
src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/models/responses.py
src/utils/token_counter.py
src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

src/models/responses.py

src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

src/models/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

🧬 Code graph analysis (3)

tests/unit/app/endpoints/test_query.py (2)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/app/endpoints/query.py (1)

retrieve_response (582-731)

src/app/endpoints/streaming_query.py (2)

src/utils/types.py (1)

TurnSummary (89-163)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/app/endpoints/query.py (1)

src/utils/token_counter.py (2)

extract_and_update_token_metrics (97-130)

TokenCounter (18-41)

🪛 GitHub Actions: Pyright

src/app/endpoints/streaming_query.py

[error] 160-160: pyright: 'TokenCounter' is not defined.

[error] 821-821: pyright: Argument missing for parameter 'token_usage'.

🪛 GitHub Actions: Python linter

src/app/endpoints/streaming_query.py

[error] 160-160: Undefined variable 'TokenCounter'.

[warning] 159-159: Unused argument 'summary' (unused-argument).

[error] 821-821: No value for argument 'token_usage' in function call.

🪛 GitHub Actions: Ruff

src/app/endpoints/streaming_query.py

[error] 160-160: Ruff: Undefined name 'TokenCounter' (F821) in src/app/endpoints/streaming_query.py:160.

🪛 GitHub Actions: Type checks

src/app/endpoints/streaming_query.py

[error] 160-160: Mypy error during 'uv run mypy ...': Name 'TokenCounter' is not defined.

[error] 821-821: Mypy error during 'uv run mypy ...': Missing positional argument 'token_usage' in call to 'stream_end_event'.

🪛 GitHub Actions: Unit tests

tests/unit/app/endpoints/test_query.py

[error] 37-37: NameError: name 'TokenCounter' is not defined

src/models/responses.py

[warning] 125-125: Pydantic: Using extra keyword arguments on Field is deprecated and will be removed in a future version.

src/app/endpoints/streaming_query.py

[error] 160-160: NameError: name 'TokenCounter' is not defined.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)

🔇 Additional comments (10)

src/app/endpoints/streaming_query.py (1)

157-162: Unused summary parameter is expected.

The linter warns about the unused summary parameter, but this is intentional. Line 205 has a TODO indicating that summary will be used for implementing RAG chunks in the end event. The parameter is included now to establish the correct signature for future enhancement.

src/models/responses.py (1)

231-253: LGTM! Clean data model extension.

The new fields (truncated, input_tokens, output_tokens, available_quotas) are well-defined with appropriate defaults, clear descriptions, and helpful examples. The docstring and model examples are properly updated to reflect these additions.

tests/unit/app/endpoints/test_query.py (2)

41-41: LGTM! Proper test mocking for token metrics.

The import and mock_metrics helper are correctly updated to mock extract_and_update_token_metrics and return a TokenCounter() instance. The helper also appropriately mocks the internal Prometheus metrics to ensure test isolation.

Also applies to: 65-75

498-500: LGTM! Consistent 4-tuple unpacking throughout tests.

All test functions are properly updated to handle the new return signature from retrieve_response, which now returns (summary, conversation_id, referenced_documents, TokenCounter). The unpacking patterns are used consistently, and tests that don't need the token usage simply use _ placeholders.

Also applies to: 534-536, 571-573, 614-616, 668-670, 725-727, 784-786, 845-847, 904-906, 1081-1083, 1155-1157, 1231-1237, 1320-1322, 1582-1584, 1637-1639

src/utils/token_counter.py (2)

44-94: LGTM! Solid token counting implementation.

The function correctly extracts token usage using the same tokenizer as the metrics system, ensuring consistency. The error handling with fallback default estimates (100 input, 50 output tokens) provides resilience when token counting fails. The system prompt is properly included with role="system" and the input_tokens_counted field is populated.

97-130: LGTM! Clean integration of extraction and metrics.

The function properly combines token extraction with Prometheus metrics updates, ensuring both the API response and monitoring stay in sync. Error handling for metric updates is appropriate—failures are logged but don't prevent returning the token counts to the caller.

src/app/endpoints/query.py (4)

57-57: LGTM! Correct import for token tracking.

The import properly brings in extract_and_update_token_metrics and TokenCounter from the new utils.token_counter module, replacing the previous metrics-only approach.

282-291: LGTM! Proper unpacking of token usage.

The call to retrieve_response correctly unpacks the new 4-tuple return value, capturing token_usage for use in building the final response and storing transcripts.

368-378: LGTM! Token usage properly integrated into response.

The QueryResponse construction correctly uses token_usage.input_tokens and token_usage.output_tokens from the extracted token counter. The TODOs for truncated and available_quotas appropriately mark future enhancements while providing sensible defaults now.

590-590: LGTM! Complete token usage integration in retrieve_response.

The function signature, docstring, implementation, and return statement are all correctly updated to extract and return token usage. The call to extract_and_update_token_metrics properly passes the turn, model label, provider ID, and system prompt, ensuring accurate token counting and metrics updates.

Also applies to: 613-616, 711-715, 731-731

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/unit/app/endpoints/test_query.py (1)
65-74: Consider removing redundant metric mocks.

Since extract_and_update_token_metrics is already mocked to return TokenCounter(), the individual metric mocks at lines 72-74 are unnecessary—the mocked function won't execute its internal metric updates. These lines can be safely removed to simplify the test helper.

Apply this diff to remove the redundant mocks:
 def mock_metrics(mocker):
     """Helper function to mock metrics operations for query endpoints."""
     mocker.patch(
         "app.endpoints.query.extract_and_update_token_metrics",
         return_value=TokenCounter(),
     )
-    # Mock the metrics that are called inside extract_and_update_token_metrics
-    mocker.patch("metrics.llm_token_sent_total")
-    mocker.patch("metrics.llm_token_received_total")
-    mocker.patch("metrics.llm_calls_total")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3350a5a and 647a15c.

📒 Files selected for processing (6)

src/app/endpoints/query.py (7 hunks)
src/app/endpoints/streaming_query.py (4 hunks)
src/models/responses.py (3 hunks)
src/utils/token_counter.py (1 hunks)
tests/unit/app/endpoints/test_query.py (22 hunks)
tests/unit/app/endpoints/test_streaming_query.py (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

tests/unit/app/endpoints/test_streaming_query.py
src/models/responses.py
src/utils/token_counter.py

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/query.py
src/app/endpoints/streaming_query.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/query.py
src/app/endpoints/streaming_query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
tests/unit/app/endpoints/test_query.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/query.py
src/app/endpoints/streaming_query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/query.py
src/app/endpoints/streaming_query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query.py

🧬 Code graph analysis (3)

src/app/endpoints/query.py (1)

src/utils/token_counter.py (2)

extract_and_update_token_metrics (97-130)

TokenCounter (18-41)

src/app/endpoints/streaming_query.py (2)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/types.py (1)

TurnSummary (89-163)

tests/unit/app/endpoints/test_query.py (2)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/app/endpoints/query.py (1)

retrieve_response (582-731)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)

🔇 Additional comments (9)

tests/unit/app/endpoints/test_query.py (2)

41-41: LGTM: Import required for test mocking.

The TokenCounter import is necessary for creating instances in test mocks and follows the absolute import guidelines.

202-202: LGTM: Test cases correctly updated for new return signature.

All test cases have been consistently updated to handle the new 4-tuple return value from retrieve_response, correctly unpacking the token usage as the 4th element. Mocks properly return TokenCounter() instances.

Also applies to: 498-500, 534-536, 571-573, 614-616, 668-670, 725-727, 784-786, 845-847, 904-906, 1081-1083, 1155-1157, 1230-1237, 1320-1322, 1405-1410, 1466-1466, 1522-1522, 1582-1584, 1637-1639

src/app/endpoints/query.py (5)

57-57: LGTM: Import statements correct.

The imports of extract_and_update_token_metrics and TokenCounter follow absolute import guidelines and are used appropriately in the code.

282-291: LGTM: Correct unpacking of extended return value.

The unpacking of the 4-tuple from retrieve_response correctly captures the new token_usage field, which is subsequently used to populate token fields in the response.

368-378: LGTM: QueryResponse correctly populated with token fields.

The new fields input_tokens and output_tokens are correctly populated from token_usage. The TODOs for truncated and available_quotas appropriately indicate future work scope.

590-590: LGTM: Function signature and docstring properly updated.

The return type annotation and docstring correctly describe the extended 4-tuple return value including token usage information. As per coding guidelines, the docstring follows Google Python style.

Also applies to: 614-616

711-715: LGTM: Token usage extraction correctly integrated.

The call to extract_and_update_token_metrics properly extracts token usage and updates Prometheus metrics in a single operation. The model label extraction logic correctly handles both "provider/model" and plain "model" formats. The return statement correctly includes the token usage as the 4th element.

Also applies to: 731-731

src/app/endpoints/streaming_query.py (2)

158-163: Unused parameter is acceptable for future implementation.

The summary parameter is currently unused (hence the pylint disable comment), but this is justified given the TODO comment at line 206 indicating it will be used for RAG chunks implementation. This approach maintains a consistent function signature in preparation for that enhancement.

209-210: LGTM! Token usage correctly integrated into end event.

The end event payload now uses token_usage.input_tokens and token_usage.output_tokens to populate the response, replacing the previous placeholder values. The implementation is correct.

src/app/endpoints/streaming_query.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 647a15c and 5eca3a2.

📒 Files selected for processing (6)

src/app/endpoints/query.py (7 hunks)
src/app/endpoints/streaming_query.py (4 hunks)
src/models/responses.py (3 hunks)
src/utils/token_counter.py (1 hunks)
tests/unit/app/endpoints/test_query.py (22 hunks)
tests/unit/app/endpoints/test_streaming_query.py (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/models/responses.py

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py
src/utils/token_counter.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py
tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_streaming_query.py
src/utils/token_counter.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query.py
src/app/endpoints/query.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_streaming_query.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_query.py
tests/unit/app/endpoints/test_streaming_query.py

🧬 Code graph analysis (4)

src/app/endpoints/streaming_query.py (2)

src/utils/token_counter.py (2)

TokenCounter (18-41)

extract_token_usage_from_turn (44-94)

src/utils/types.py (1)

TurnSummary (89-163)

src/app/endpoints/query.py (3)

src/utils/token_counter.py (2)

extract_and_update_token_metrics (97-130)

TokenCounter (18-41)

src/utils/types.py (1)

TurnSummary (89-163)

src/models/responses.py (1)

ReferencedDocument (164-176)

tests/unit/app/endpoints/test_query.py (2)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/app/endpoints/query.py (1)

retrieve_response (582-731)

tests/unit/app/endpoints/test_streaming_query.py (4)

src/models/responses.py (1)

RAGChunk (148-153)

src/utils/token_counter.py (1)

TokenCounter (18-41)

src/utils/types.py (1)

TurnSummary (89-163)

src/app/endpoints/streaming_query.py (1)

stream_end_event (158-214)

🪛 GitHub Actions: Pyright

src/app/endpoints/streaming_query.py

[error] 823-823: "p" is possibly unbound (reportPossiblyUnboundVariable)

[error] 823-823: Cannot access attribute "turn" for class "AgentTurnResponseStepCompletePayload". Attribute "turn" is unknown (reportAttributeAccessIssue)

[error] 823-823: Cannot access attribute "turn" for class "AgentTurnResponseStepStartPayload". Attribute "turn" is unknown (reportAttributeAccessIssue)

[error] 823-823: Cannot access attribute "turn" for class "AgentTurnResponseStepProgressPayload". Attribute "turn" is unknown (reportAttributeAccessIssue)

[error] 823-823: Cannot access attribute "turn" for class "AgentTurnResponseTurnStartPayload". Attribute "turn" is unknown (reportAttributeAccessIssue)

🪛 GitHub Actions: Type checks

src/app/endpoints/streaming_query.py

[error] 823-823: mypy: Item "AgentTurnResponseStepStartPayload" of a union has no attribute "turn". (union-attr)

[error] 823-823: mypy: Item "AgentTurnResponseStepProgressPayload" of a union has no attribute "turn". (union-attr)

[error] 823-823: mypy: Item "AgentTurnResponseStepCompletePayload" of a union has no attribute "turn". (union-attr)

[error] 823-823: mypy: Item "AgentTurnResponseTurnStartPayload" of a union has no attribute "turn". (union-attr)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests (ci)

src/app/endpoints/streaming_query.py

- Add TokenCounter dataclass to track input/output tokens and LLM calls - Update QueryResponse model with token usage fields (input_tokens, output_tokens, truncated, available_quotas) - Implement extract_token_usage_from_turn() function for token counting - Update /query and /streaming_query endpoints to include token usage in responses - Modify retrieve_response() to return token usage information - Update test cases to handle new return values and mock token usage - Maintain backward compatibility with existing API structure The implementation provides a foundation for token tracking that can be enhanced with more sophisticated counting logic in the future.

gallettilance force-pushed the tokens branch from 05512b6 to 5cc6860 Compare October 7, 2025 17:36

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

src/utils/token_counter.py Show resolved Hide resolved

gallettilance force-pushed the tokens branch from 5cc6860 to 253ddd9 Compare October 7, 2025 18:03

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

gallettilance force-pushed the tokens branch from 253ddd9 to fe55e3b Compare October 7, 2025 18:11

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

src/app/endpoints/query.py Outdated Show resolved Hide resolved

src/utils/token_counter.py Show resolved Hide resolved

gallettilance force-pushed the tokens branch from fe55e3b to 2eaa17d Compare October 7, 2025 18:46

tisnik approved these changes Oct 9, 2025

View reviewed changes

gallettilance force-pushed the tokens branch 3 times, most recently from 59ad2f1 to 3350a5a Compare October 9, 2025 16:23

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

gallettilance force-pushed the tokens branch 2 times, most recently from e0fc0f5 to 647a15c Compare October 9, 2025 16:31

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

src/app/endpoints/streaming_query.py Outdated Show resolved Hide resolved

gallettilance force-pushed the tokens branch from 647a15c to 5eca3a2 Compare October 9, 2025 16:41

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

src/app/endpoints/streaming_query.py Show resolved Hide resolved

gallettilance force-pushed the tokens branch from 5eca3a2 to 958c626 Compare October 9, 2025 16:52

tisnik merged commit 5da4f31 into lightspeed-core:main Oct 9, 2025
18 of 19 checks passed

LCORE-178: add token usage tracking to LLM responses #635

LCORE-178: add token usage tracking to LLM responses #635

Uh oh!

Conversation

gallettilance commented Oct 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gallettilance commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 7, 2025 •

edited

Loading