Skip to content

Conversation

@cdoern
Copy link
Contributor

@cdoern cdoern commented Nov 6, 2025

What does this PR do?

Delete ~2,000 lines of dead code from the old bespoke inference API that was replaced by OpenAI-only API. This includes removing unused type conversion functions, dead provider methods, and event_logger.py.

Clean up imports across the codebase to remove references to deleted types. This eliminates unnecessary
code and dependencies, helping isolate the API package as a self-contained module.

This is the last interdependency between the .api package and "exterior" packages, meaning that now every other package in llama stack imports the API, not the other way around.

Test Plan

this is a structural change, no tests needed.

@ashwinb
Copy link
Contributor

ashwinb commented Nov 6, 2025

These are relics of the older inference API... who's needing these still? I remember the eval API still using SamplingParams -- maybe that's the only reference? If that's the case, could we migrate that eval API to a more OpenAI compatible format? It is marked v1alpha now anyway.

@cdoern
Copy link
Contributor Author

cdoern commented Nov 7, 2025

@ashwinb, you're right, most of these are actually being used in "dead" types. Let me see how much dead code I can remove and if it makes this functionally a no-op. Eval might be a tricky case but let me see what I can do.

@cdoern cdoern changed the title feat: isolate model response types refactor: remove dead inference API code and clean up imports Nov 7, 2025
@cdoern cdoern force-pushed the models-dep branch 2 times, most recently from 4b1d0bb to a5d912a Compare November 7, 2025 22:05
@cdoern
Copy link
Contributor Author

cdoern commented Nov 7, 2025

@ashwinb let me know if this makes sense! it kills two birds with one stone: no more .models imports into .api but also as you suggested, a lot of this code was dead inference code so I took the liberty to get rid of alot!

@cdoern
Copy link
Contributor Author

cdoern commented Nov 7, 2025

there were also a bunch of duplicate classes in .models and .apis.inference. so I consolidated those a bit.

Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this seems correct to do

Delete ~1,300 lines of dead code from the old bespoke inference API that was replaced by OpenAI-only API. This includes removing unused type conversion
functions, dead provider methods, and event_logger.py. Clean up imports across the codebase to remove references to deleted types. This eliminates unnecessary
code and dependencies, helping isolate the API package as a self-contained module.
This is the last interdependency between the .api package and "exterior" packages, meaning that now every other package in llama stack imports the API, not the other way around.

The API is now self contained and can be moved into its own package.

Signed-off-by: Charlie Doern <[email protected]>
- Add chat_completion() method to LlamaGenerator supporting OpenAI request format
- Implement openai_chat_completion() in MetaReferenceInferenceImpl
- Fix ModelRunner task dispatch to handle chat_completion tasks
- Add convert_openai_message_to_raw_message() utility for message conversion
- Add unit tests for message conversion and model-parallel dispatch
- Remove unused CompletionRequestWithRawContent references

Signed-off-by: Charlie Doern <[email protected]>
@cdoern
Copy link
Contributor Author

cdoern commented Nov 10, 2025

CI is hanging, trying to see if it is a flake. tests pass locally with record and replay.


# Get logits processor for response format
logits_processor = None
if request.response_format:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.response_format is a concrete OpenAIResponseFormatParam, not a dict, so this branch never runs and JSON schema requests silently fall back to unconstrained sampling. Could we inspect the actual variant (e.g., OpenAIResponseFormatJSONSchema) and pass its schema through to JsonSchemaResponseFormat like before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right . sorry about that. fixing this up now

# Prepare sampling params
sampling_params = SamplingParams()
if request.temperature is not None or request.top_p is not None:
sampling_params.strategy = TopPSamplingStrategy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temperature=request.temperature or 1.0 (and the same for top_p) treats 0.0 as “unset”, so callers can’t ask for deterministic decoding—the code always bumps it back to 1. Likewise, when both fields are omitted we never inherit OpenAI’s defaults. Could we distinguish None vs real numbers so 0.0 and other values survive untouched?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is a great point. thanks!

created=created,
model=params.model,
choices=[
OpenAIChoice(
Copy link
Contributor

@ashwinb ashwinb Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openai_chat_completion always runs exactly one generation, flattens it into a single string, and returns one choice with a hard-coded finish_reason. means streaming responses, and tool-call outputs all silently stop working after this refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix this! However, I believe the meta_reference provider was not working before this PR. all methods in inference.py raised NotImplementedError @leseb raised a similar topic above.

Copy link
Contributor

@ashwinb ashwinb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments

 This commit addresses comments  regarding the
 OpenAI chat completion implementation in the meta_reference provider.

Tool Augmentation
 - Add `augment_raw_messages_for_tools()` to properly inject tool definitions into prompts
 - Support model-family-specific tool formats:
   * Llama 3.1/3.2 multimodal: JsonCustomToolGenerator with JSON format
   * Llama 3.2/3.3/4: PythonListCustomToolGenerator with Python list format
 - Handle tool_choice hints (auto/required/specific tool)
 - Preserve existing system messages while adding tool context

Streaming & Tool Call Detection
 - Implement streaming support via `params.stream` with `_stream_chat_completion()`
 - Add tool call detection by decoding assistant messages after generation
 - Set proper `finish_reason` based on content ("stop" vs "tool_calls")
 - Convert internal ToolCall format to OpenAI-compatible types
 - Stream chunks incrementally with proper delta formatting

Type Corrections
 - Fix response_format handling in generators.py to properly extract schema from
     OpenAIJSONSchema TypedDict and use correct ResponseFormatType enum
- Use correct OpenAI types: OpenAIChatCompletionToolCall, OpenAIChunkChoice,
     OpenAIChoiceDelta, OpenAIChatCompletionToolCallFunction

Signed-off-by: Charlie Doern <[email protected]>
@cdoern
Copy link
Contributor Author

cdoern commented Nov 10, 2025

I believe the most recent commit adds a significant amount of functionality to the meta_reference inference provider. Previously this provider was not implemented from what I can tell, all methods raised NotImplementedError (or maybe it was implemented at one point but loss functionality once the old inference API was deprecated). the other files in the directory model_parallel.py and generators.py had implementation but I think that was dead code since nothing in the provider itself called into it.

So, the functionality in the last two commits is net-new, but satisfies reviewer comments and general usability of the provider as a whole.

I would be wary to add more functionality though in this PR, since I believe it goes out of scope. Please let me know if that makes sense @ashwinb @leseb ! thanks

@ashwinb
Copy link
Contributor

ashwinb commented Nov 10, 2025

@cdoern that is completely fair! would you mind filing an issue to say "ensure meta-reference works" or something to that effect?

@cdoern
Copy link
Contributor Author

cdoern commented Nov 10, 2025

@ashwinb here you go: #4117

@cdoern
Copy link
Contributor Author

cdoern commented Nov 10, 2025

also, I don't know why the server GPT tests keep hanging but this is something I have observed on other PRs I have open as well.

@ashwinb
Copy link
Contributor

ashwinb commented Nov 10, 2025

also, I don't know why the server GPT tests keep hanging but this is something I have observed on other PRs I have open as well.

I think the conversations tests > sqlite store is the root cause. There are race conditions there we need to resolve carefully.

@ashwinb
Copy link
Contributor

ashwinb commented Nov 10, 2025

@cdoern can you try running the failing test suite locally once?

@cdoern
Copy link
Contributor Author

cdoern commented Nov 10, 2025

yup @ashwinb here are the results (all green):

╰─ ./scripts/integration-tests.sh --inference-mode replay --suite responses --setup gpt --stack-config server:ci-tests
=== Llama Stack Integration Test Runner ===
Stack Config: server:ci-tests
Setup: gpt
Inference Mode: replay
Test Suite: responses
Test Subdirs:
Test Pattern:

Checking llama packages
Using Python 3.12.11 environment at: venv
llama-stack                              0.4.0.dev0  /Users/charliedoern/projects/Documents/llama-stack
llama-stack-client                       0.4.0a5     /Users/charliedoern/projects/Documents/llama-stack-client-python
ollama                                   0.6.0
=== Applying Setup Environment Variables ===
Setting SQLITE_STORE_DIR: /var/folders/1d/nf8gv8497qx6m3y8d9dpzwww0000gn/T/tmp.zsgnbYQY7O
Setting stack config type: server
Setting up environment variables:


Will use port: 8321
=== Starting Llama Stack Server ===
Waiting for Llama Stack Server to start on port 8321...
✅ Llama Stack Server started successfully

=== Running Integration Tests ===
Test subdirs to run:
+ STACK_CONFIG_ARG=
+ [[ -n server:ci-tests ]]
+ STACK_CONFIG_ARG=--stack-config=server:ci-tests
+ pytest -s -v tests/integration/ --stack-config=server:ci-tests --inference-mode=replay -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag )' --setup=gpt --suite=responses --color=yes --embedding-model=sentence-transformers/nomic-ai/nomic-embed-text-v1.5 --color=yes --setup=gpt --suite=responses --capture=tee-sys
INFO:tests.integration.conftest:Applying setup 'gpt' for suite responses
INFO:tests.integration.conftest:Test stack config type: server (stack_config=server:ci-tests)
=================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/charliedoern/projects/Documents/llama-stack/venv/bin/python3.12
cachedir: .pytest_cache
rootdir: /Users/charliedoern/projects/Documents/llama-stack
configfile: pyproject.toml
plugins: anyio-4.11.0
collected 73 items

tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[openai_client-txt=openai/gpt-4o-earth]
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.020s
INFO:httpx:HTTP Request: GET http://localhost:8321/v1/providers "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models "HTTP/1.1 200 OK"
PASSED                                                [  1%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_basic[openai_client-txt=openai/gpt-4o-earth] PASSED                                                    [  2%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_incremental_content[openai_client-txt=openai/gpt-4o-earth] PASSED                                      [  4%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_multi_turn[openai_client-txt=openai/gpt-4o-earth] PASSED                                           [  5%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_image[openai_client-txt=openai/gpt-4o-llama_image] PASSED                                          [  6%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_multi_turn_image[openai_client-txt=openai/gpt-4o-llama_image_understanding] PASSED                 [  8%]
tests/integration/responses/test_file_search.py::test_response_text_format[openai_client-txt=openai/gpt-4o-text_format0] PASSED                                                     [  9%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_region[openai_client-txt=openai/gpt-4o] PASSED                                                 [ 10%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_category[openai_client-txt=openai/gpt-4o] PASSED                                               [ 12%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_date_range[openai_client-txt=openai/gpt-4o] PASSED                                             [ 13%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_and[openai_client-txt=openai/gpt-4o] PASSED                                              [ 15%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_or[openai_client-txt=openai/gpt-4o] PASSED                                               [ 16%]
tests/integration/responses/test_file_search.py::test_response_file_search_streaming_events[openai_client-txt=openai/gpt-4o] PASSED                                                 [ 17%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts] PASSED [ 19%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search_empty_vector_store[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768] PASSED [ 20%]
tests/integration/responses/test_tool_responses.py::test_response_sequential_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768] PASSED [ 21%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_web_search[openai_client-txt=openai/gpt-4o-llama_experts] PASSED                                    [ 23%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_mcp_tool[openai_client-txt=openai/gpt-4o-boiling_point_tool] SKIPPED (in-process MCP server is ...) [ 24%]
tests/integration/responses/test_tool_responses.py::test_response_sequential_mcp_tool[openai_client-txt=openai/gpt-4o-boiling_point_tool] SKIPPED (in-process MCP server is onl...) [ 26%]
tests/integration/responses/test_tool_responses.py::test_response_mcp_tool_approval[openai_client-txt=openai/gpt-4o-True-boiling_point_tool] SKIPPED (in-process MCP server is ...) [ 27%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_custom_tool[openai_client-txt=openai/gpt-4o-sf_weather] PASSED                                      [ 28%]
tests/integration/responses/test_tool_responses.py::test_response_function_call_ordering_1[openai_client-txt=openai/gpt-4o-sf_weather] PASSED                                       [ 30%]
tests/integration/responses/test_tool_responses.py::test_response_function_call_ordering_2[openai_client-txt=openai/gpt-4o] PASSED                                                  [ 31%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_multi_turn_tool_execution[openai_client-txt=openai/gpt-4o-user_file_access_check] SKIPPED (in-p...) [ 32%]
tests/integration/responses/test_tool_responses.py::test_response_streaming_multi_turn_tool_execution[openai_client-txt=openai/gpt-4o-user_permissions_workflow] SKIPPED (in-pr...) [ 34%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_basic_workflow[txt=openai/gpt-4o] PASSED                                   [ 35%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_multi_turn_and_streaming[txt=openai/gpt-4o] PASSED                         [ 36%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_context_loading[txt=openai/gpt-4o] PASSED                                  [ 38%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_error_handling[txt=openai/gpt-4o] PASSED                                   [ 39%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_backward_compatibility[txt=openai/gpt-4o] PASSED                           [ 41%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[openai_client-txt=openai/gpt-4o-saturn] PASSED                                               [ 42%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_basic[openai_client-txt=openai/gpt-4o-saturn] PASSED                                                   [ 43%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_incremental_content[openai_client-txt=openai/gpt-4o-saturn] PASSED                                     [ 45%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_multi_turn[client_with_models-txt=openai/gpt-4o-earth] PASSED                                      [ 46%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_image[client_with_models-txt=openai/gpt-4o-llama_image] PASSED                                     [ 47%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_multi_turn_image[client_with_models-txt=openai/gpt-4o-llama_image_understanding] PASSED            [ 49%]
tests/integration/responses/test_file_search.py::test_response_text_format[openai_client-txt=openai/gpt-4o-text_format1] PASSED                                                     [ 50%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_region[client_with_models-txt=openai/gpt-4o] PASSED                                            [ 52%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_category[client_with_models-txt=openai/gpt-4o] PASSED                                          [ 53%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_date_range[client_with_models-txt=openai/gpt-4o] PASSED                                        [ 54%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_and[client_with_models-txt=openai/gpt-4o] PASSED                                         [ 56%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_or[client_with_models-txt=openai/gpt-4o] PASSED                                          [ 57%]
tests/integration/responses/test_file_search.py::test_response_file_search_streaming_events[client_with_models-txt=openai/gpt-4o] PASSED                                            [ 58%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts_pdf] PASSED [ 60%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search_empty_vector_store[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768] PASSED [ 61%]
tests/integration/responses/test_tool_responses.py::test_response_sequential_file_search[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768] PASSED [ 63%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_web_search[openai_client-txt=openai/gpt-4o-web_search_2025_08_26_type] PASSED                       [ 64%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_mcp_tool[client_with_models-txt=openai/gpt-4o-boiling_point_tool] SKIPPED (in-process MCP serve...) [ 65%]
tests/integration/responses/test_tool_responses.py::test_response_sequential_mcp_tool[client_with_models-txt=openai/gpt-4o-boiling_point_tool] SKIPPED (in-process MCP server i...) [ 67%]
tests/integration/responses/test_tool_responses.py::test_response_mcp_tool_approval[openai_client-txt=openai/gpt-4o-False-boiling_point_tool] SKIPPED (in-process MCP server is...) [ 68%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_custom_tool[client_with_models-txt=openai/gpt-4o-sf_weather] PASSED                                 [ 69%]
tests/integration/responses/test_tool_responses.py::test_response_function_call_ordering_1[client_with_models-txt=openai/gpt-4o-sf_weather] PASSED                                  [ 71%]
tests/integration/responses/test_tool_responses.py::test_response_function_call_ordering_2[client_with_models-txt=openai/gpt-4o] PASSED                                             [ 72%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_multi_turn_tool_execution[openai_client-txt=openai/gpt-4o-experiment_results_lookup] SKIPPED (i...) [ 73%]
tests/integration/responses/test_tool_responses.py::test_response_streaming_multi_turn_tool_execution[openai_client-txt=openai/gpt-4o-experiment_analysis_streaming] SKIPPED (i...) [ 75%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[client_with_models-txt=openai/gpt-4o-earth] PASSED                                           [ 76%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_basic[client_with_models-txt=openai/gpt-4o-earth] PASSED                                               [ 78%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_incremental_content[client_with_models-txt=openai/gpt-4o-earth] PASSED                                 [ 79%]
tests/integration/responses/test_file_search.py::test_response_text_format[client_with_models-txt=openai/gpt-4o-text_format0] PASSED                                                [ 80%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_web_search[client_with_models-txt=openai/gpt-4o-llama_experts] PASSED                               [ 82%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts] PASSED [ 83%]
tests/integration/responses/test_tool_responses.py::test_response_mcp_tool_approval[client_with_models-txt=openai/gpt-4o-True-boiling_point_tool] SKIPPED (in-process MCP serve...) [ 84%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_multi_turn_tool_execution[client_with_models-txt=openai/gpt-4o-user_file_access_check] SKIPPED      [ 86%]
tests/integration/responses/test_tool_responses.py::test_response_streaming_multi_turn_tool_execution[client_with_models-txt=openai/gpt-4o-user_permissions_workflow] SKIPPED (...) [ 87%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[client_with_models-txt=openai/gpt-4o-saturn] PASSED                                          [ 89%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_basic[client_with_models-txt=openai/gpt-4o-saturn] PASSED                                              [ 90%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_incremental_content[client_with_models-txt=openai/gpt-4o-saturn] PASSED                                [ 91%]
tests/integration/responses/test_file_search.py::test_response_text_format[client_with_models-txt=openai/gpt-4o-text_format1] PASSED                                                [ 93%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_web_search[client_with_models-txt=openai/gpt-4o-web_search_2025_08_26_type] PASSED                  [ 94%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts_pdf] PASSED [ 95%]
tests/integration/responses/test_tool_responses.py::test_response_mcp_tool_approval[client_with_models-txt=openai/gpt-4o-False-boiling_point_tool] SKIPPED (in-process MCP serv...) [ 97%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_multi_turn_tool_execution[client_with_models-txt=openai/gpt-4o-experiment_results_lookup] SKIPPED   [ 98%]
tests/integration/responses/test_tool_responses.py::test_response_streaming_multi_turn_tool_execution[client_with_models-txt=openai/gpt-4o-experiment_analysis_streaming] SKIPPED   [100%]

================================================================================== slowest 10 durations ===================================================================================
6.29s setup    tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_region[openai_client-txt=openai/gpt-4o]
0.18s call     tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts_pdf]
0.15s call     tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts_pdf]
0.12s call     tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[openai_client-txt=openai/gpt-4o-earth]
0.09s setup    tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_category[openai_client-txt=openai/gpt-4o]
0.09s setup    tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_and[client_with_models-txt=openai/gpt-4o]
0.09s setup    tests/integration/responses/test_file_search.py::test_response_file_search_streaming_events[client_with_models-txt=openai/gpt-4o]
0.09s call     tests/integration/responses/test_tool_responses.py::test_response_sequential_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768]
0.09s setup    tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_or[openai_client-txt=openai/gpt-4o]
0.09s setup    tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[openai_client-txt=openai/gpt-4o-earth]
======================================================================= 57 passed, 16 skipped, 2 warnings in 9.21s ========================================================================
+ exit_code=0
+ set +x
✅ All tests completed successfully

=== Integration Tests Complete ===
Stopping Llama Stack Server...
Killing Llama Stack Server processes: 4597
Llama Stack Server stopped

@ashwinb ashwinb merged commit 43adc23 into llamastack:main Nov 10, 2025
42 of 44 checks passed
ashwinb pushed a commit that referenced this pull request Nov 13, 2025
…3895)

# What does this PR do?

Extract API definitions and provider specifications into a standalone
llama-stack-api package that can be published to PyPI independently of
the main llama-stack server.


see: #2978 and
#2978 (comment)

Motivation

External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:

- Install only the type definitions they need without server
dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently

This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.

Changes

- Created llama-stack-api package with minimal dependencies (pydantic,
jsonschema)
- Moved APIs, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages

Next Steps

- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests


Pre-cursor PRs to this one:

- #4093 
- #3954 
- #4064 

These PRs moved key pieces _out_ of the Api pkg, limiting the scope of
change here.


relates to #3237 

## Test Plan

Package builds successfully and can be imported independently. All
pre-commit hooks pass with expected exclusions maintained.

---------

Signed-off-by: Charlie Doern <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants