refactor: remove dead inference API code and clean up imports #4093

cdoern · 2025-11-06T19:45:09Z

What does this PR do?

Delete ~2,000 lines of dead code from the old bespoke inference API that was replaced by OpenAI-only API. This includes removing unused type conversion functions, dead provider methods, and event_logger.py.

Clean up imports across the codebase to remove references to deleted types. This eliminates unnecessary
code and dependencies, helping isolate the API package as a self-contained module.

This is the last interdependency between the .api package and "exterior" packages, meaning that now every other package in llama stack imports the API, not the other way around.

Test Plan

this is a structural change, no tests needed.

ashwinb · 2025-11-06T22:08:47Z

These are relics of the older inference API... who's needing these still? I remember the eval API still using SamplingParams -- maybe that's the only reference? If that's the case, could we migrate that eval API to a more OpenAI compatible format? It is marked v1alpha now anyway.

cdoern · 2025-11-07T15:21:27Z

@ashwinb, you're right, most of these are actually being used in "dead" types. Let me see how much dead code I can remove and if it makes this functionally a no-op. Eval might be a tricky case but let me see what I can do.

cdoern · 2025-11-07T22:08:04Z

@ashwinb let me know if this makes sense! it kills two birds with one stone: no more .models imports into .api but also as you suggested, a lot of this code was dead inference code so I took the liberty to get rid of alot!

cdoern · 2025-11-07T22:10:30Z

there were also a bunch of duplicate classes in .models and .apis.inference. so I consolidated those a bit.

ashwinb

yeah this seems correct to do

src/llama_stack/core/routers/safety.py

src/llama_stack/models/llama/llama3/template_data.py

src/llama_stack/providers/inline/inference/meta_reference/model_parallel.py

src/llama_stack/providers/utils/inference/litellm_openai_mixin.py

Delete ~1,300 lines of dead code from the old bespoke inference API that was replaced by OpenAI-only API. This includes removing unused type conversion functions, dead provider methods, and event_logger.py. Clean up imports across the codebase to remove references to deleted types. This eliminates unnecessary code and dependencies, helping isolate the API package as a self-contained module. This is the last interdependency between the .api package and "exterior" packages, meaning that now every other package in llama stack imports the API, not the other way around. The API is now self contained and can be moved into its own package. Signed-off-by: Charlie Doern <[email protected]>

- Add chat_completion() method to LlamaGenerator supporting OpenAI request format - Implement openai_chat_completion() in MetaReferenceInferenceImpl - Fix ModelRunner task dispatch to handle chat_completion tasks - Add convert_openai_message_to_raw_message() utility for message conversion - Add unit tests for message conversion and model-parallel dispatch - Remove unused CompletionRequestWithRawContent references Signed-off-by: Charlie Doern <[email protected]>

src/llama_stack/providers/inline/inference/meta_reference/inference.py

cdoern · 2025-11-10T16:37:58Z

CI is hanging, trying to see if it is a flake. tests pass locally with record and replay.

src/llama_stack/providers/inline/inference/meta_reference/generators.py

ashwinb · 2025-11-10T18:27:37Z

src/llama_stack/providers/inline/inference/meta_reference/generators.py

+
+        # Get logits processor for response format
+        logits_processor = None
+        if request.response_format:


request.response_format is a concrete OpenAIResponseFormatParam, not a dict, so this branch never runs and JSON schema requests silently fall back to unconstrained sampling. Could we inspect the actual variant (e.g., OpenAIResponseFormatJSONSchema) and pass its schema through to JsonSchemaResponseFormat like before?

yes, you're right . sorry about that. fixing this up now

ashwinb · 2025-11-10T18:27:42Z

src/llama_stack/providers/inline/inference/meta_reference/generators.py

+        # Prepare sampling params
+        sampling_params = SamplingParams()
+        if request.temperature is not None or request.top_p is not None:
+            sampling_params.strategy = TopPSamplingStrategy(


temperature=request.temperature or 1.0 (and the same for top_p) treats 0.0 as “unset”, so callers can’t ask for deterministic decoding—the code always bumps it back to 1. Likewise, when both fields are omitted we never inherit OpenAI’s defaults. Could we distinguish None vs real numbers so 0.0 and other values survive untouched?

yes, this is a great point. thanks!

ashwinb · 2025-11-10T18:27:58Z

src/llama_stack/providers/inline/inference/meta_reference/inference.py

+            created=created,
+            model=params.model,
+            choices=[
+                OpenAIChoice(


openai_chat_completion always runs exactly one generation, flattens it into a single string, and returns one choice with a hard-coded finish_reason. means streaming responses, and tool-call outputs all silently stop working after this refactor.

I can fix this! However, I believe the meta_reference provider was not working before this PR. all methods in inference.py raised NotImplementedError @leseb raised a similar topic above.

ashwinb

See inline comments

This commit addresses comments regarding the OpenAI chat completion implementation in the meta_reference provider. Tool Augmentation - Add `augment_raw_messages_for_tools()` to properly inject tool definitions into prompts - Support model-family-specific tool formats: * Llama 3.1/3.2 multimodal: JsonCustomToolGenerator with JSON format * Llama 3.2/3.3/4: PythonListCustomToolGenerator with Python list format - Handle tool_choice hints (auto/required/specific tool) - Preserve existing system messages while adding tool context Streaming & Tool Call Detection - Implement streaming support via `params.stream` with `_stream_chat_completion()` - Add tool call detection by decoding assistant messages after generation - Set proper `finish_reason` based on content ("stop" vs "tool_calls") - Convert internal ToolCall format to OpenAI-compatible types - Stream chunks incrementally with proper delta formatting Type Corrections - Fix response_format handling in generators.py to properly extract schema from OpenAIJSONSchema TypedDict and use correct ResponseFormatType enum - Use correct OpenAI types: OpenAIChatCompletionToolCall, OpenAIChunkChoice, OpenAIChoiceDelta, OpenAIChatCompletionToolCallFunction Signed-off-by: Charlie Doern <[email protected]>

cdoern · 2025-11-10T20:54:20Z

I believe the most recent commit adds a significant amount of functionality to the meta_reference inference provider. Previously this provider was not implemented from what I can tell, all methods raised NotImplementedError (or maybe it was implemented at one point but loss functionality once the old inference API was deprecated). the other files in the directory model_parallel.py and generators.py had implementation but I think that was dead code since nothing in the provider itself called into it.

So, the functionality in the last two commits is net-new, but satisfies reviewer comments and general usability of the provider as a whole.

I would be wary to add more functionality though in this PR, since I believe it goes out of scope. Please let me know if that makes sense @ashwinb @leseb ! thanks

ashwinb · 2025-11-10T21:23:53Z

@cdoern that is completely fair! would you mind filing an issue to say "ensure meta-reference works" or something to that effect?

cdoern · 2025-11-10T21:33:46Z

@ashwinb here you go: #4117

cdoern · 2025-11-10T21:35:42Z

also, I don't know why the server GPT tests keep hanging but this is something I have observed on other PRs I have open as well.

ashwinb · 2025-11-10T21:37:26Z

also, I don't know why the server GPT tests keep hanging but this is something I have observed on other PRs I have open as well.

I think the conversations tests > sqlite store is the root cause. There are race conditions there we need to resolve carefully.

ashwinb · 2025-11-10T22:12:21Z

@cdoern can you try running the failing test suite locally once?

cdoern · 2025-11-10T23:17:31Z

yup @ashwinb here are the results (all green):

╰─ ./scripts/integration-tests.sh --inference-mode replay --suite responses --setup gpt --stack-config server:ci-tests
=== Llama Stack Integration Test Runner ===
Stack Config: server:ci-tests
Setup: gpt
Inference Mode: replay
Test Suite: responses
Test Subdirs:
Test Pattern:

Checking llama packages
Using Python 3.12.11 environment at: venv
llama-stack                              0.4.0.dev0  /Users/charliedoern/projects/Documents/llama-stack
llama-stack-client                       0.4.0a5     /Users/charliedoern/projects/Documents/llama-stack-client-python
ollama                                   0.6.0
=== Applying Setup Environment Variables ===
Setting SQLITE_STORE_DIR: /var/folders/1d/nf8gv8497qx6m3y8d9dpzwww0000gn/T/tmp.zsgnbYQY7O
Setting stack config type: server
Setting up environment variables:


Will use port: 8321
=== Starting Llama Stack Server ===
Waiting for Llama Stack Server to start on port 8321...
✅ Llama Stack Server started successfully

=== Running Integration Tests ===
Test subdirs to run:
+ STACK_CONFIG_ARG=
+ [[ -n server:ci-tests ]]
+ STACK_CONFIG_ARG=--stack-config=server:ci-tests
+ pytest -s -v tests/integration/ --stack-config=server:ci-tests --inference-mode=replay -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag )' --setup=gpt --suite=responses --color=yes --embedding-model=sentence-transformers/nomic-ai/nomic-embed-text-v1.5 --color=yes --setup=gpt --suite=responses --capture=tee-sys
INFO:tests.integration.conftest:Applying setup 'gpt' for suite responses
INFO:tests.integration.conftest:Test stack config type: server (stack_config=server:ci-tests)
=================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/charliedoern/projects/Documents/llama-stack/venv/bin/python3.12
cachedir: .pytest_cache
rootdir: /Users/charliedoern/projects/Documents/llama-stack
configfile: pyproject.toml
plugins: anyio-4.11.0
collected 73 items

tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[openai_client-txt=openai/gpt-4o-earth]
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.020s
INFO:httpx:HTTP Request: GET http://localhost:8321/v1/providers "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models "HTTP/1.1 200 OK"
PASSED                                                [  1%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_basic[openai_client-txt=openai/gpt-4o-earth] PASSED                                                    [  2%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_incremental_content[openai_client-txt=openai/gpt-4o-earth] PASSED                                      [  4%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_multi_turn[openai_client-txt=openai/gpt-4o-earth] PASSED                                           [  5%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_image[openai_client-txt=openai/gpt-4o-llama_image] PASSED                                          [  6%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_multi_turn_image[openai_client-txt=openai/gpt-4o-llama_image_understanding] PASSED                 [  8%]
tests/integration/responses/test_file_search.py::test_response_text_format[openai_client-txt=openai/gpt-4o-text_format0] PASSED                                                     [  9%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_region[openai_client-txt=openai/gpt-4o] PASSED                                                 [ 10%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_category[openai_client-txt=openai/gpt-4o] PASSED                                               [ 12%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_date_range[openai_client-txt=openai/gpt-4o] PASSED                                             [ 13%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_and[openai_client-txt=openai/gpt-4o] PASSED                                              [ 15%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_or[openai_client-txt=openai/gpt-4o] PASSED                                               [ 16%]
tests/integration/responses/test_file_search.py::test_response_file_search_streaming_events[openai_client-txt=openai/gpt-4o] PASSED                                                 [ 17%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts] PASSED [ 19%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search_empty_vector_store[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768] PASSED [ 20%]
tests/integration/responses/test_tool_responses.py::test_response_sequential_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768] PASSED [ 21%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_web_search[openai_client-txt=openai/gpt-4o-llama_experts] PASSED                                    [ 23%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_mcp_tool[openai_client-txt=openai/gpt-4o-boiling_point_tool] SKIPPED (in-process MCP server is ...) [ 24%]
tests/integration/responses/test_tool_responses.py::test_response_sequential_mcp_tool[openai_client-txt=openai/gpt-4o-boiling_point_tool] SKIPPED (in-process MCP server is onl...) [ 26%]
tests/integration/responses/test_tool_responses.py::test_response_mcp_tool_approval[openai_client-txt=openai/gpt-4o-True-boiling_point_tool] SKIPPED (in-process MCP server is ...) [ 27%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_custom_tool[openai_client-txt=openai/gpt-4o-sf_weather] PASSED                                      [ 28%]
tests/integration/responses/test_tool_responses.py::test_response_function_call_ordering_1[openai_client-txt=openai/gpt-4o-sf_weather] PASSED                                       [ 30%]
tests/integration/responses/test_tool_responses.py::test_response_function_call_ordering_2[openai_client-txt=openai/gpt-4o] PASSED                                                  [ 31%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_multi_turn_tool_execution[openai_client-txt=openai/gpt-4o-user_file_access_check] SKIPPED (in-p...) [ 32%]
tests/integration/responses/test_tool_responses.py::test_response_streaming_multi_turn_tool_execution[openai_client-txt=openai/gpt-4o-user_permissions_workflow] SKIPPED (in-pr...) [ 34%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_basic_workflow[txt=openai/gpt-4o] PASSED                                   [ 35%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_multi_turn_and_streaming[txt=openai/gpt-4o] PASSED                         [ 36%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_context_loading[txt=openai/gpt-4o] PASSED                                  [ 38%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_error_handling[txt=openai/gpt-4o] PASSED                                   [ 39%]
tests/integration/responses/test_conversation_responses.py::TestConversationResponses::test_conversation_backward_compatibility[txt=openai/gpt-4o] PASSED                           [ 41%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[openai_client-txt=openai/gpt-4o-saturn] PASSED                                               [ 42%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_basic[openai_client-txt=openai/gpt-4o-saturn] PASSED                                                   [ 43%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_incremental_content[openai_client-txt=openai/gpt-4o-saturn] PASSED                                     [ 45%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_multi_turn[client_with_models-txt=openai/gpt-4o-earth] PASSED                                      [ 46%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_image[client_with_models-txt=openai/gpt-4o-llama_image] PASSED                                     [ 47%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_multi_turn_image[client_with_models-txt=openai/gpt-4o-llama_image_understanding] PASSED            [ 49%]
tests/integration/responses/test_file_search.py::test_response_text_format[openai_client-txt=openai/gpt-4o-text_format1] PASSED                                                     [ 50%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_region[client_with_models-txt=openai/gpt-4o] PASSED                                            [ 52%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_category[client_with_models-txt=openai/gpt-4o] PASSED                                          [ 53%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_date_range[client_with_models-txt=openai/gpt-4o] PASSED                                        [ 54%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_and[client_with_models-txt=openai/gpt-4o] PASSED                                         [ 56%]
tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_or[client_with_models-txt=openai/gpt-4o] PASSED                                          [ 57%]
tests/integration/responses/test_file_search.py::test_response_file_search_streaming_events[client_with_models-txt=openai/gpt-4o] PASSED                                            [ 58%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts_pdf] PASSED [ 60%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search_empty_vector_store[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768] PASSED [ 61%]
tests/integration/responses/test_tool_responses.py::test_response_sequential_file_search[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768] PASSED [ 63%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_web_search[openai_client-txt=openai/gpt-4o-web_search_2025_08_26_type] PASSED                       [ 64%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_mcp_tool[client_with_models-txt=openai/gpt-4o-boiling_point_tool] SKIPPED (in-process MCP serve...) [ 65%]
tests/integration/responses/test_tool_responses.py::test_response_sequential_mcp_tool[client_with_models-txt=openai/gpt-4o-boiling_point_tool] SKIPPED (in-process MCP server i...) [ 67%]
tests/integration/responses/test_tool_responses.py::test_response_mcp_tool_approval[openai_client-txt=openai/gpt-4o-False-boiling_point_tool] SKIPPED (in-process MCP server is...) [ 68%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_custom_tool[client_with_models-txt=openai/gpt-4o-sf_weather] PASSED                                 [ 69%]
tests/integration/responses/test_tool_responses.py::test_response_function_call_ordering_1[client_with_models-txt=openai/gpt-4o-sf_weather] PASSED                                  [ 71%]
tests/integration/responses/test_tool_responses.py::test_response_function_call_ordering_2[client_with_models-txt=openai/gpt-4o] PASSED                                             [ 72%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_multi_turn_tool_execution[openai_client-txt=openai/gpt-4o-experiment_results_lookup] SKIPPED (i...) [ 73%]
tests/integration/responses/test_tool_responses.py::test_response_streaming_multi_turn_tool_execution[openai_client-txt=openai/gpt-4o-experiment_analysis_streaming] SKIPPED (i...) [ 75%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[client_with_models-txt=openai/gpt-4o-earth] PASSED                                           [ 76%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_basic[client_with_models-txt=openai/gpt-4o-earth] PASSED                                               [ 78%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_incremental_content[client_with_models-txt=openai/gpt-4o-earth] PASSED                                 [ 79%]
tests/integration/responses/test_file_search.py::test_response_text_format[client_with_models-txt=openai/gpt-4o-text_format0] PASSED                                                [ 80%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_web_search[client_with_models-txt=openai/gpt-4o-llama_experts] PASSED                               [ 82%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts] PASSED [ 83%]
tests/integration/responses/test_tool_responses.py::test_response_mcp_tool_approval[client_with_models-txt=openai/gpt-4o-True-boiling_point_tool] SKIPPED (in-process MCP serve...) [ 84%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_multi_turn_tool_execution[client_with_models-txt=openai/gpt-4o-user_file_access_check] SKIPPED      [ 86%]
tests/integration/responses/test_tool_responses.py::test_response_streaming_multi_turn_tool_execution[client_with_models-txt=openai/gpt-4o-user_permissions_workflow] SKIPPED (...) [ 87%]
tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[client_with_models-txt=openai/gpt-4o-saturn] PASSED                                          [ 89%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_basic[client_with_models-txt=openai/gpt-4o-saturn] PASSED                                              [ 90%]
tests/integration/responses/test_basic_responses.py::test_response_streaming_incremental_content[client_with_models-txt=openai/gpt-4o-saturn] PASSED                                [ 91%]
tests/integration/responses/test_file_search.py::test_response_text_format[client_with_models-txt=openai/gpt-4o-text_format1] PASSED                                                [ 93%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_web_search[client_with_models-txt=openai/gpt-4o-web_search_2025_08_26_type] PASSED                  [ 94%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts_pdf] PASSED [ 95%]
tests/integration/responses/test_tool_responses.py::test_response_mcp_tool_approval[client_with_models-txt=openai/gpt-4o-False-boiling_point_tool] SKIPPED (in-process MCP serv...) [ 97%]
tests/integration/responses/test_tool_responses.py::test_response_non_streaming_multi_turn_tool_execution[client_with_models-txt=openai/gpt-4o-experiment_results_lookup] SKIPPED   [ 98%]
tests/integration/responses/test_tool_responses.py::test_response_streaming_multi_turn_tool_execution[client_with_models-txt=openai/gpt-4o-experiment_analysis_streaming] SKIPPED   [100%]

================================================================================== slowest 10 durations ===================================================================================
6.29s setup    tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_region[openai_client-txt=openai/gpt-4o]
0.18s call     tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts_pdf]
0.15s call     tests/integration/responses/test_tool_responses.py::test_response_non_streaming_file_search[client_with_models-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768-llama_experts_pdf]
0.12s call     tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[openai_client-txt=openai/gpt-4o-earth]
0.09s setup    tests/integration/responses/test_file_search.py::test_response_file_search_filter_by_category[openai_client-txt=openai/gpt-4o]
0.09s setup    tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_and[client_with_models-txt=openai/gpt-4o]
0.09s setup    tests/integration/responses/test_file_search.py::test_response_file_search_streaming_events[client_with_models-txt=openai/gpt-4o]
0.09s call     tests/integration/responses/test_tool_responses.py::test_response_sequential_file_search[openai_client-txt=openai/gpt-4o:emb=sentence-transformers/nomic-ai/nomic-embed-text-v1.5:dim=768]
0.09s setup    tests/integration/responses/test_file_search.py::test_response_file_search_filter_compound_or[openai_client-txt=openai/gpt-4o]
0.09s setup    tests/integration/responses/test_basic_responses.py::test_response_non_streaming_basic[openai_client-txt=openai/gpt-4o-earth]
======================================================================= 57 passed, 16 skipped, 2 warnings in 9.21s ========================================================================
+ exit_code=0
+ set +x
✅ All tests completed successfully

=== Integration Tests Complete ===
Stopping Llama Stack Server...
Killing Llama Stack Server processes: 4597
Llama Stack Server stopped

…3895) # What does this PR do? Extract API definitions and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server. see: #2978 and #2978 (comment) Motivation External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to: - Install only the type definitions they need without server dependencies - Avoid version conflicts with the installed llama-stack package - Be versioned and released independently This enables us to re-enable external provider module tests that were previously blocked by these import conflicts. Changes - Created llama-stack-api package with minimal dependencies (pydantic, jsonschema) - Moved APIs, providers datatypes, strong_typing, and schema_utils - Updated all imports from llama_stack.* to llama_stack_api.* - Configured local editable install for development workflow - Updated linting and type-checking configuration for both packages Next Steps - Publish llama-stack-api to PyPI - Update external provider dependencies - Re-enable external provider module tests Pre-cursor PRs to this one: - #4093 - #3954 - #4064 These PRs moved key pieces _out_ of the Api pkg, limiting the scope of change here. relates to #3237 ## Test Plan Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained. --------- Signed-off-by: Charlie Doern <[email protected]>

cdoern requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners November 6, 2025 19:45

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 6, 2025

cdoern force-pushed the models-dep branch 2 times, most recently from eb81355 to e919b53 Compare November 6, 2025 19:51

cdoern mentioned this pull request Nov 6, 2025

feat: split API and provider specs into separate llama-stack-api pkg #3895

Merged

cdoern force-pushed the models-dep branch from e919b53 to a1cf032 Compare November 7, 2025 21:37

cdoern changed the title ~~feat: isolate model response types~~ refactor: remove dead inference API code and clean up imports Nov 7, 2025

cdoern force-pushed the models-dep branch 2 times, most recently from 4b1d0bb to a5d912a Compare November 7, 2025 22:05

ashwinb reviewed Nov 7, 2025

View reviewed changes

cdoern added 2 commits November 8, 2025 14:33

cdoern force-pushed the models-dep branch from 9ccfe9a to 7574f14 Compare November 8, 2025 19:33

leseb reviewed Nov 10, 2025

View reviewed changes

src/llama_stack/providers/inline/inference/meta_reference/inference.py Show resolved Hide resolved

leseb approved these changes Nov 10, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into models-dep

dac1ff1

cdoern force-pushed the models-dep branch from 4c9898a to dac1ff1 Compare November 10, 2025 16:37

cdoern force-pushed the models-dep branch from c4f11e2 to dac1ff1 Compare November 10, 2025 17:36

ashwinb reviewed Nov 10, 2025

View reviewed changes

src/llama_stack/providers/inline/inference/meta_reference/generators.py Show resolved Hide resolved

ashwinb reviewed Nov 10, 2025

View reviewed changes

ashwinb requested changes Nov 10, 2025

View reviewed changes

cdoern force-pushed the models-dep branch from a0ce99c to 1b77826 Compare November 10, 2025 20:48

ashwinb approved these changes Nov 10, 2025

View reviewed changes

cdoern mentioned this pull request Nov 10, 2025

Ensure functionality of inline::meta-reference inference provider #4117

Open

ashwinb merged commit 43adc23 into llamastack:main Nov 10, 2025
42 of 44 checks passed

refactor: remove dead inference API code and clean up imports #4093

refactor: remove dead inference API code and clean up imports #4093

Uh oh!

Conversation

cdoern commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

ashwinb commented Nov 6, 2025

Uh oh!

cdoern commented Nov 7, 2025

Uh oh!

cdoern commented Nov 7, 2025

Uh oh!

cdoern commented Nov 7, 2025

Uh oh!

ashwinb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cdoern commented Nov 10, 2025

Uh oh!

Uh oh!

ashwinb Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

cdoern Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

ashwinb Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

cdoern Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

ashwinb Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdoern Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

ashwinb left a comment

Choose a reason for hiding this comment

Uh oh!

cdoern commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashwinb commented Nov 10, 2025

Uh oh!

cdoern commented Nov 10, 2025

Uh oh!

cdoern commented Nov 10, 2025

Uh oh!

ashwinb commented Nov 10, 2025

Uh oh!

ashwinb commented Nov 10, 2025

Uh oh!

cdoern commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cdoern commented Nov 6, 2025 •

edited

Loading

ashwinb Nov 10, 2025 •

edited

Loading

cdoern commented Nov 10, 2025 •

edited

Loading