You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,6 +35,7 @@ The service includes comprehensive user data collection capabilities for various
35
35
*[K8s based authentication](#k8s-based-authentication)
36
36
*[JSON Web Keyset based authentication](#json-web-keyset-based-authentication)
37
37
*[No-op authentication](#no-op-authentication)
38
+
*[RAG Configuration](#rag-configuration)
38
39
*[Usage](#usage)
39
40
*[Make targets](#make-targets)
40
41
*[Running Linux container image](#running-linux-container-image)
@@ -451,7 +452,21 @@ service:
451
452
Credentials are not allowed with wildcard origins per CORS/Fetch spec.
452
453
See https://fastapi.tiangolo.com/tutorial/cors/
453
454
455
+
# RAG Configuration
454
456
457
+
The [guide to RAG setup](docs/rag_guide.md) provides guidance on setting up RAG and includes tested examples for both inference and vector store integration.
458
+
459
+
## Example configurations for inference
460
+
461
+
The following configurations are llama-stack config examples from production deployments:
462
+
463
+
- [Granite on vLLM example](examples/vllm-granite-run.yaml)
464
+
- [Qwen3 on vLLM example](examples/vllm-qwen3-run.yaml)
465
+
- [Gemini example](examples/gemini-run.yaml)
466
+
- [VertexAI example](examples/vertexai-run.yaml)
467
+
468
+
> [!NOTE]
469
+
> RAG functionality is **not tested** for these configurations.
Copy file name to clipboardExpand all lines: docs/rag_guide.md
+122-1Lines changed: 122 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,7 +61,7 @@ Update the `run.yaml` file used by Llama Stack to point to:
61
61
* Your downloaded **embedding model**
62
62
* Your generated **vector database**
63
63
64
-
Example:
64
+
### FAISS example
65
65
66
66
```yaml
67
67
models:
@@ -100,10 +100,113 @@ Where:
100
100
- `db_path`is the path to the vector index (.db file in this case)
101
101
- `vector_db_id`is the index ID used to generate the db
102
102
103
+
See the full working [config example](examples/openai-faiss-run.yaml) for more details.
104
+
105
+
### pgvector example
106
+
107
+
This example shows how to configure a remote PostgreSQL database with the [pgvector](https://github.com/pgvector/pgvector) extension for storing embeddings.
108
+
109
+
> You will need to install PostgreSQL with a matching version to pgvector, then log in with `psql` and enable the extension with:
110
+
> ```sql
111
+
> CREATE EXTENSION IF NOT EXISTS vector;
112
+
> ```
113
+
114
+
Update the connection details (`host`, `port`, `db`, `user`, `password`) to match your PostgreSQL setup.
115
+
116
+
Each pgvector-backed table follows this schema:
117
+
118
+
- `id` (`text`): UUID identifier of the chunk
119
+
- `document` (`jsonb`): json containing content and metadata associated with the embedding
120
+
- `embedding` (`vector(n)`): the embedding vector, where `n` is the embedding dimension and will match the model's output size (e.g. 768 for `all-mpnet-base-v2`)
121
+
122
+
> [!NOTE]
123
+
> The `vector_db_id` (e.g. `rhdocs`) is used to point to the table named `vector_store_rhdocs` in the specified database, which stores the vector embeddings.
> When experimenting with different `models`, `providers` and `vector_dbs`, you might need to manually unregister the old ones with the Llama Stack client CLI (e.g. `llama-stack-client vector_dbs list`)
134
237
135
238
239
+
See the full working [config example](examples/openai-faiss-run.yaml) for more details.
240
+
241
+
### Azure OpenAI
242
+
243
+
Not yet supported.
244
+
245
+
### Ollama
246
+
247
+
The `remote::ollama` provider can be used for inference. However, it does not support tool calling, including RAG.
248
+
While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot be used with `llama-stack` due to current limitations in the `remote::openai` provider.
249
+
250
+
There is an [ongoing discussion](https://github.com/meta-llama/llama-stack/discussions/3034) about enabling tool calling with Ollama.
251
+
Currently, tool calling is not supported out of the box. Some experimental patches exist (including internal workarounds), but these are not officially released.
252
+
253
+
### vLLM Mistral
254
+
255
+
The RAG tool calls where not working properly when experimenting with `mistralai/Mistral-7B-Instruct-v0.3` on vLLM.
0 commit comments