-
Notifications
You must be signed in to change notification settings - Fork 49
[WIP] Add OpenAI Responses API endpoint with MVP functionality #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements /v1/responses endpoint providing OpenAI-compatible API interface while leveraging existing Lightspeed RAG and LLM integration. - Add CreateResponseRequest and OpenAIResponse models following OpenAI spec - Implement responses endpoint handler with proper auth and error handling - Add OpenAI to Lightspeed request/response mapping utilities - Add RESPONSES action to authorization system - Include comprehensive unit test coverage (100% for new code) - Maintain full compatibility with existing authentication patterns - Support referenced documents via metadata field for RAG integration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>"
WalkthroughThis pull request introduces a new OpenAI-compatible Changes
Sequence DiagramsequenceDiagram
participant Client
participant Endpoint as /responses Endpoint
participant Mapping as OpenAI Mapping
participant LlamaStack as Llama Stack Client
participant Metrics as Metrics
Client->>Endpoint: POST /v1/responses (OpenAI format)
Endpoint->>Endpoint: Validate config & auth
Endpoint->>Mapping: map_openai_to_query_request()
Mapping->>Mapping: Convert to internal format
Mapping-->>Endpoint: QueryRequest
Endpoint->>LlamaStack: retrieve_response(QueryRequest)
alt Success
LlamaStack-->>Endpoint: QueryResponse
Endpoint->>Mapping: map_query_to_openai_response()
Mapping->>Mapping: Generate ID, timestamp, usage
Mapping-->>Endpoint: OpenAIResponse
Endpoint-->>Client: 200 OpenAIResponse
else APIConnectionError
LlamaStack-->>Endpoint: APIConnectionError
Endpoint->>Metrics: Increment LLM failure metric
Endpoint-->>Client: 500 Error Detail
else Validation Error
Endpoint->>Endpoint: ValueError/AttributeError/TypeError
Endpoint-->>Client: 422 Invalid Input
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/unit/test_openai_response_models.py (1)
104-113: Consider adding total_tokens validation.The test acknowledges that total_tokens validation may be added later, but currently allows mismatches between the sum of prompt_tokens + completion_tokens and total_tokens. While this is documented as intentional, consider adding a @model_validator to ensure data integrity.
If you decide to add this validation, you could add this validator to the ResponseUsage model in src/models/responses.py:
@model_validator(mode="after") def validate_total_tokens(self) -> "ResponseUsage": """Validate that total_tokens matches the sum of prompt and completion tokens.""" expected_total = self.prompt_tokens + self.completion_tokens if self.total_tokens != expected_total: raise ValueError( f"total_tokens ({self.total_tokens}) must equal " f"prompt_tokens + completion_tokens ({expected_total})" ) return selftests/unit/app/endpoints/test_responses.py (1)
24-30: Consider using the shared MOCK_AUTH constant.The coding guidelines specify: "Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests". While your current MOCK_AUTH uses a UUID format which may be intentional for this specific test, consider whether the shared constant should be used for consistency across the test suite.
As per coding guidelines
src/utils/openai_mapping.py (1)
23-73: Consider handling empty instructions string.Line 55 directly assigns
openai_request.instructionstosystem_prompt, which could be an empty string if provided. Depending on downstream behavior, an empty string might be treated differently thanNone. Consider normalizing empty strings toNonefor consistency.Apply this diff if empty instructions should be treated as absent:
# Map OpenAI instructions to Lightspeed system_prompt - system_prompt = openai_request.instructions + system_prompt = openai_request.instructions if openai_request.instructions else None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
src/app/endpoints/responses.py(1 hunks)src/app/routers.py(2 hunks)src/models/config.py(1 hunks)src/models/requests.py(1 hunks)src/models/responses.py(2 hunks)src/utils/openai_mapping.py(1 hunks)tests/unit/app/endpoints/test_responses.py(1 hunks)tests/unit/app/test_routers.py(5 hunks)tests/unit/authorization/test_resolvers.py(1 hunks)tests/unit/test_openai_mapping.py(1 hunks)tests/unit/test_openai_requests.py(1 hunks)tests/unit/test_openai_response_models.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (10)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/utils/openai_mapping.pysrc/models/config.pysrc/models/requests.pysrc/models/responses.pysrc/app/endpoints/responses.pysrc/app/routers.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/utils/openai_mapping.pysrc/models/config.pytests/unit/authorization/test_resolvers.pysrc/models/requests.pytests/unit/test_openai_mapping.pytests/unit/test_openai_response_models.pytests/unit/test_openai_requests.pysrc/models/responses.pytests/unit/app/test_routers.pysrc/app/endpoints/responses.pytests/unit/app/endpoints/test_responses.pysrc/app/routers.py
src/{models/config.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/config.py,configuration.py}: All configuration uses Pydantic models extending ConfigurationBase
Configuration base models must set model_config with extra="forbid" to reject unknown fields
Files:
src/models/config.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/config.pysrc/models/requests.pysrc/models/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/config.pysrc/models/requests.pysrc/models/responses.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/authorization/test_resolvers.pytests/unit/test_openai_mapping.pytests/unit/test_openai_response_models.pytests/unit/test_openai_requests.pytests/unit/app/test_routers.pytests/unit/app/endpoints/test_responses.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/authorization/test_resolvers.pytests/unit/test_openai_mapping.pytests/unit/test_openai_response_models.pytests/unit/test_openai_requests.pytests/unit/app/test_routers.pytests/unit/app/endpoints/test_responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/responses.pysrc/app/routers.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/responses.pysrc/app/routers.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/responses.py
🧠 Learnings (3)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/models/**/*.py : Use model_validator and field_validator for Pydantic model validation
Applied to files:
src/models/responses.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/{models/**/*.py,configuration.py} : Use field_validator and model_validator for custom validation in Pydantic models
Applied to files:
src/models/responses.py
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/app/**/*.py : Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Applied to files:
src/app/routers.py
🧬 Code graph analysis (8)
src/utils/openai_mapping.py (2)
src/models/requests.py (2)
CreateResponseRequest(418-512)QueryRequest(73-225)src/models/responses.py (5)
QueryResponse(194-305)ResponseContent(1177-1224)ResponseMessage(1227-1271)ResponseOutput(1274-1308)ResponseUsage(1311-1347)
tests/unit/authorization/test_resolvers.py (2)
src/models/config.py (2)
AccessRule(381-385)Action(329-378)src/authorization/resolvers.py (7)
GenericAccessResolver(146-202)check_access(123-124)check_access(134-138)check_access(171-188)get_actions(127-128)get_actions(140-143)get_actions(190-202)
tests/unit/test_openai_mapping.py (3)
src/models/requests.py (2)
CreateResponseRequest(418-512)QueryRequest(73-225)src/models/responses.py (7)
QueryResponse(194-305)OpenAIResponse(1350-1544)ReferencedDocument(179-191)ResponseContent(1177-1224)ResponseMessage(1227-1271)ResponseOutput(1274-1308)ResponseUsage(1311-1347)src/utils/openai_mapping.py (2)
map_openai_to_query_request(23-73)map_query_to_openai_response(76-154)
tests/unit/test_openai_response_models.py (1)
src/models/responses.py (5)
OpenAIResponse(1350-1544)ResponseOutput(1274-1308)ResponseMessage(1227-1271)ResponseContent(1177-1224)ResponseUsage(1311-1347)
tests/unit/test_openai_requests.py (1)
src/models/requests.py (1)
CreateResponseRequest(418-512)
src/app/endpoints/responses.py (8)
src/authentication/__init__.py (1)
get_auth_dependency(14-52)src/authorization/middleware.py (1)
authorize(111-122)src/client.py (2)
AsyncLlamaStackClientHolder(18-55)get_client(49-55)src/configuration.py (1)
configuration(73-77)src/models/requests.py (1)
CreateResponseRequest(418-512)src/models/responses.py (4)
OpenAIResponse(1350-1544)ForbiddenResponse(1120-1142)UnauthorizedResponse(1094-1117)QueryResponse(194-305)src/utils/endpoints.py (1)
check_configuration_loaded(111-123)src/utils/openai_mapping.py (2)
map_openai_to_query_request(23-73)map_query_to_openai_response(76-154)
tests/unit/app/endpoints/test_responses.py (7)
src/app/endpoints/responses.py (1)
responses_endpoint_handler(75-192)src/models/config.py (2)
config(140-146)Action(329-378)src/models/requests.py (1)
CreateResponseRequest(418-512)src/models/responses.py (7)
OpenAIResponse(1350-1544)QueryResponse(194-305)ReferencedDocument(179-191)ResponseContent(1177-1224)ResponseMessage(1227-1271)ResponseOutput(1274-1308)ResponseUsage(1311-1347)src/utils/types.py (1)
TurnSummary(89-163)src/utils/token_counter.py (1)
TokenCounter(18-41)src/client.py (1)
get_client(49-55)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(36-51)
🪛 GitHub Actions: Python linter
tests/unit/app/test_routers.py
[error] 43-43: Pylint: W0621: Redefining name 'responses' from outer scope (line 9) (redefined-outer-name).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: e2e_tests (azure)
- GitHub Check: e2e_tests (ci)
🔇 Additional comments (10)
src/app/routers.py (1)
21-21: LGTM!The import and registration of the
responsesrouter follows the existing patterns and is correctly integrated at the/v1prefix alongside other API endpoints.Also applies to: 39-39
tests/unit/authorization/test_resolvers.py (1)
344-362: LGTM!The test properly validates that the RESPONSES action integrates with the authorization system. It confirms access control, permission checks, and action enumeration work as expected.
tests/unit/app/test_routers.py (1)
68-68: LGTM!The router count and prefix assertions are correctly updated to account for the new
responsesrouter.Also applies to: 77-77, 93-93, 102-102
src/models/config.py (1)
353-354: LGTM!The RESPONSES action enum member is correctly added with appropriate documentation and value, following the established pattern for action definitions.
tests/unit/test_openai_requests.py (1)
1-140: LGTM!The test suite comprehensively validates the CreateResponseRequest model, covering:
- Valid minimal and full configurations
- Required field validation
- Boundary conditions for temperature and max_output_tokens
- Extra field rejection (forbid mode)
- Input validation for both string and array types
- Model configuration structure
Test coverage appears complete for the new request model.
tests/unit/test_openai_response_models.py (1)
1-266: LGTM!The test suite comprehensively validates all OpenAI response models, covering:
- Valid and invalid type/role/finish_reason/status/object values
- Empty array/string validation
- Required field validation
- Nested model validation
- Metadata handling
Test coverage appears complete for the new response models.
tests/unit/app/endpoints/test_responses.py (1)
140-370: LGTM!The test suite comprehensively covers the /responses endpoint behavior:
- Successful response generation with proper OpenAI response structure
- Authorization enforcement (via signature inspection)
- API connection error handling with metrics
- Request and response mapping invocation
- Validation error handling (ValueError, AttributeError, TypeError)
The tests properly mock dependencies and verify error paths, status codes, and metric updates.
src/utils/openai_mapping.py (2)
76-154: LGTM!The mapping function correctly converts internal QueryResponse to OpenAI format:
- Generates unique IDs and timestamps appropriately
- Maps token usage correctly with calculated totals
- Preserves referenced documents in metadata
- Creates proper nested response structure with content, message, and output
The implementation follows the OpenAI specification and handles the MVP requirements well.
48-49: MVP limitation properly documented; external ticket verification required.The code change at lines 48-49 correctly implements the MVP limitation for array input with clear error messaging. The codebase contains comments indicating Phase 2 enhancements are planned (including array input support and explicit model mapping). However, LCORE-901 is not referenced anywhere in the repository—it appears to be tracked in an external or internal ticket system outside the codebase. Verify that Phase 2 plans for array input support are properly tracked in your internal issue management system.
src/app/endpoints/responses.py (1)
31-31: Use module-qualified logger name.The project's Python guidelines call for
logger = logging.getLogger(__name__)so log records inherit the module path correctly. Please switch to the__name__-based logger. As per coding guidelines.⛔ Skipped due to learnings
Learnt from: CR Repo: lightspeed-core/lightspeed-stack PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-09-18T16:46:33.353Z Learning: Applies to **/*.py : Use logger = logging.getLogger(__name__) for module logging after import loggingLearnt from: CR Repo: lightspeed-core/lightspeed-stack PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-09-18T16:46:33.353Z Learning: Applies to **/*.py : Use appropriate logging levels: debug, info, warning, error with clear messagesLearnt from: luis5tb Repo: lightspeed-core/lightspeed-stack PR: 727 File: src/app/endpoints/a2a.py:43-43 Timestamp: 2025-10-29T13:05:22.438Z Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
| authorized, | ||
| metrics, | ||
| tools, | ||
| responses, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the naming collision between imported module and parameter name.
The Pylint error indicates that the responses parameter on line 43 shadows the imported responses module from line 25. While this is in a mock class and may not cause runtime issues, it violates best practices and triggers linting failures.
Apply this diff to rename the parameter:
def include_router( # pylint: disable=too-many-arguments
self,
router: Any,
*,
prefix: str = "",
tags=None,
dependencies=None,
- responses=None,
+ response_models=None,
deprecated=None,
include_in_schema=None,
default_response_class=None,
callbacks=None,
generate_unique_id_function=None,
) -> None:Also applies to: 43-43
🤖 Prompt for AI Agents
In tests/unit/app/test_routers.py around lines 25 and 43, the imported module
name responses (line 25) is being shadowed by a parameter named responses in the
mock class at line 43; rename the parameter (e.g., to resp or mock_responses)
and update all references to that parameter inside the class (constructor and
any methods) so they refer to the new name, leaving the module import unchanged;
run the tests/linter to confirm the shadowing warning is resolved.
| # For MVP simplicity, use default model/provider selection logic from query.py | ||
| # This will be enhanced in Phase 2 to support explicit model mapping | ||
| summary, conversation_id, referenced_documents, token_usage = ( | ||
| await retrieve_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note this is not using the responses API from llamastack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldnt this PR require moving Llama Stack to 0.3.x to use the new Llama Stack Responses API https://llamastack.github.io/docs/api/agents? As the previous Llama Stack Agent APIs are deprecated https://llamastack.github.io/docs/api-deprecated/agents ... i.e.; do we need an explicit LCORE /responses endpoint if we switch to Llama Stack 0.3.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on what level of completeness LCORE is willing to live with @maysunfaisal .... the responses API was introduced a few months ago in 0.2.x but was labeled work in progress, with some known bugs and missing pieces
so there could be some staging in play
but charting a roadmap that after some intermediate stages ends up having LCORE leverage the llama stack openai api compatible endpoint, vs. the deprecated agent apis or some other responses api endpoint, should be the end goal
Description
Implements /v1/responses endpoint providing OpenAI-compatible API interface
while leveraging existing Lightspeed RAG and LLM integration.
🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]"
Type of change
Related Tickets & Documents
https://issues.redhat.com/browse/LCORE-901
Checklist before requesting a review
Testing
Summary by CodeRabbit
Release Notes
New Features
Improvements