-
Notifications
You must be signed in to change notification settings - Fork 51
LCORE-651: using rhoai lls image #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds platform linux/amd64 and two empty env vars to the llama-stack service in docker-compose, moves the mounted run.yaml path, replaces the test.containerfile base image and installs faiss-cpu as root before switching to UID 1001, and disables uv installation in the Containerfile final stage. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Developer
participant DC as docker-compose
participant BuilderImg as test.containerfile build
participant FinalImg as final Containerfile
participant LS as llama-stack container
Dev->>DC: docker-compose up llama-stack
DC->>LS: start service (platform: linux/amd64)
DC->>LS: mount `/opt/app-root/run.yaml:Z`
DC->>LS: set env BRAVE_SEARCH_API_KEY, TAVILY_SEARCH_API_KEY
Note over BuilderImg: Base image -> quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
BuilderImg->>BuilderImg: pip install faiss-cpu (as root)
BuilderImg->>LS: image runs as USER 1001
Note over FinalImg: final-stage `uv` install commented out\n(no active pip install in final image)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
depends on opendatahub-io/llama-stack-distribution#22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
docker-compose.yaml (5)
3-16: GPU reservation missing for vLLM.If you intend GPU inference, add an NVIDIA device reservation (or document CPU mode explicitly). The official docs show running with --gpus all; compose equivalent shown below. (docs.vllm.ai)
vllm: @@ networks: - lightspeednet + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu]Please confirm your target environment (Docker Compose v2 on a non-Swarm host) supports device reservations; otherwise we can switch to an alternative (e.g., runtime: nvidia or explicit /dev mappings).
5-5: Avoid forcing linux/amd64 platform.Hard-coding amd64 will trigger emulation on ARM hosts (e.g., Apple Silicon), hurting performance. Let Docker choose, or gate via env.
- platform: linux/amd64 + # platform: linux/amd64 # uncomment only if you must force x86_64Also applies to: 27-27
47-51: Healthcheck OK; consider curl fallback and slightly longer start period.Minor robustness tweak mirroring vLLM change; optional.
healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"] - interval: 10s - timeout: 5s - retries: 3 - start_period: 15s + test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || wget -qO- http://localhost:8321/v1/health >/dev/null"] + interval: 10s + timeout: 5s + retries: 6 + start_period: 30s
36-36: Trailing whitespace.Lint error at Line 36; drop trailing spaces.
- +
6-6: container_name is optional and can cause cross-project collisions.Recommend omitting unless you rely on fixed names.
- container_name: vllm + # container_name: vllm - container_name: llama-stack + # container_name: llama-stackAlso applies to: 28-28
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docker-compose.yaml(2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
docker-compose.yaml
[error] 36-36: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (2)
docker-compose.yaml (2)
32-41: Env wiring for Llama Stack ↔ vLLM looks correct.VLLM_URL and INFERENCE_MODEL match the remote vLLM distribution’s documented envs. Keep defaults consistent with the vLLM service. Health endpoint is under /v1/health. (llama-stack.readthedocs.io, infohub.delltechnologies.com)
If you want, I can add LLAMA_STACK_PORT with a default to align with docs:
environment: # Standalone mode configuration - STANDALONE=true - VLLM_URL=http://vllm:8000/v1 - INFERENCE_MODEL=${INFERENCE_MODEL:-microsoft/DialoGPT-small} + - LLAMA_STACK_PORT=${LLAMA_STACK_PORT:-8321}
13-16: Good call on caching HF models with a named volume.This will avoid re-downloading weights on container restarts.
Also applies to: 76-78
docker-compose.yaml
Outdated
| services: | ||
| # vLLM service for Red Hat distribution | ||
| vllm: | ||
| image: vllm/vllm-openai:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Pin images to versions/digests instead of floating tags.
Using latest/odh introduces non-reproducible builds and potential breakages. Pin to a known tag or, preferably, a digest.
- image: vllm/vllm-openai:latest
+ image: vllm/vllm-openai:v0.9.2 # or pin by digest
- image: quay.io/opendatahub/llama-stack:odh
+ image: quay.io/opendatahub/llama-stack:odh-<version> # or pin by digestAlso applies to: 26-26
🤖 Prompt for AI Agents
In docker-compose.yaml around lines 4 and 26, the image: vllm/vllm-openai:latest
uses a floating tag; replace it with a pinned version or, preferably, an
immutable digest (e.g., image: vllm/vllm-openai:<semver> or image:
vllm/vllm-openai@sha256:<digest>) to ensure reproducible builds and update both
occurrences at lines 4 and 26 accordingly.
docker-compose.yaml
Outdated
| environment: | ||
| - MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small} | ||
| - HOST=0.0.0.0 | ||
| - PORT=8000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM does not read MODEL_NAME/HOST/PORT envs; pass CLI args instead.
The vllm/vllm-openai image expects --model/--host/--port as arguments (envs like HOST/PORT are not used by the API server). Replace these envs with command; optionally add shm_size/ipc for PyTorch shared memory. (docs.vllm.ai)
environment:
- - MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small}
- - HOST=0.0.0.0
- - PORT=8000
+ - HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN:-}
+ command:
+ - --host
+ - 0.0.0.0
+ - --port
+ - "8000"
+ - --model
+ - ${INFERENCE_MODEL:-microsoft/DialoGPT-small}
+ shm_size: "2g"
+ # or: ipc: host📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| environment: | |
| - MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small} | |
| - HOST=0.0.0.0 | |
| - PORT=8000 | |
| environment: | |
| - HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN:-} | |
| command: | |
| - --host | |
| - 0.0.0.0 | |
| - --port | |
| - "8000" | |
| - --model | |
| - ${INFERENCE_MODEL:-microsoft/DialoGPT-small} | |
| shm_size: "2g" | |
| # or: ipc: host |
🤖 Prompt for AI Agents
In docker-compose.yaml around lines 9 to 12, the compose currently sets
MODEL_NAME/HOST/PORT as environment variables but vLLM ignores those and
requires CLI args; replace the environment entries with a command array that
passes --model ${INFERENCE_MODEL:-microsoft/DialoGPT-small} --host 0.0.0.0
--port 8000 to the container, remove the HOST/PORT envs, and optionally add
docker-compose-level settings like shm_size: "1gb" and ipc: host to support
PyTorch shared memory.
docker-compose.yaml
Outdated
| healthcheck: | ||
| test: ["CMD", "curl", "-f", "http://localhost:8000/health"] | ||
| interval: 10s | ||
| timeout: 5s | ||
| retries: 3 | ||
| start_period: 30s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hardening healthcheck and allowing longer model load time.
- curl may be absent; use CMD-SHELL with -fsS.
- Big models often need more than 30s; bump start_period.
vLLM’s health endpoint is /health. (docs.vllm.ai)
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
- interval: 10s
- timeout: 5s
- retries: 3
- start_period: 30s
+ healthcheck:
+ test: ["CMD-SHELL", "curl -fsS http://localhost:8000/health || wget -qO- http://localhost:8000/health >/dev/null"]
+ interval: 10s
+ timeout: 5s
+ retries: 6
+ start_period: 120s📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| healthcheck: | |
| test: ["CMD", "curl", "-f", "http://localhost:8000/health"] | |
| interval: 10s | |
| timeout: 5s | |
| retries: 3 | |
| start_period: 30s | |
| healthcheck: | |
| test: ["CMD-SHELL", "curl -fsS http://localhost:8000/health || wget -qO- http://localhost:8000/health >/dev/null"] | |
| interval: 10s | |
| timeout: 5s | |
| retries: 6 | |
| start_period: 120s |
🤖 Prompt for AI Agents
In docker-compose.yaml around lines 17 to 22, the healthcheck uses the exec form
with plain curl and a 30s start_period which can fail if curl is missing or
model loading takes longer; change to the shell form using CMD-SHELL and curl
-fsS (so failure is detected and curl absence is handled by the shell) and
increase start_period (for example to 120s or an appropriate longer value) to
allow big models more time to load; keep the endpoint /health and optionally
adjust interval/timeout/retries if needed for your environment.
bcf3056 to
658ab1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docker-compose.yaml (1)
4-5: Pin image to immutable digest and verify platform constraint necessity.Apply this diff:
- image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest + image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest@sha256:f8acb6064989d298d5ac83e639e69532378e4a3320191b85ca42f8f421d758cdVerify whether the
platform: linux/amd64line is required—if the image is multi-arch, remove it to allow Docker to auto-select the host architecture.
🧹 Nitpick comments (3)
docker-compose.yaml (3)
17-22: Consider hardening healthcheck for robustness.The 30s
start_periodis a reasonable improvement. For additional robustness, consider usingCMD-SHELLto handle cases wherecurlmight be unavailable:healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"] + test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || exit 1"] interval: 10s timeout: 5s retries: 3 start_period: 30sThe
-sflag silences progress,-Sshows errors, and-ffails on HTTP errors. This provides clearer failure signals in logs.
49-49: Optional: Remove redundant network driver specification.The
driver: bridgeis the default for Docker Compose networks, so this line is redundant unless there's a specific reason to make it explicit for documentation purposes.networks: lightspeednet: - driver: bridgeIf the explicit specification is intentional for clarity, consider adding a comment explaining why.
13-14: Document optional nature and usage of new API keysAdd inline comments in docker-compose.yaml to clarify that these keys are only required when their provider is enabled:
environment: - OPENAI_API_KEY=${OPENAI_API_KEY} + # Optional: Brave Search API key (required if using brave-search provider) - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-} + # Optional: Tavily Search API key (required if using tavily-search provider) - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:-}And update your documentation to note that BRAVE_SEARCH_API_KEY and TAVILY_SEARCH_API_KEY are plain-string API tokens used only when their respective providers are selected; otherwise they may be omitted.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
README.md(1 hunks)docker-compose.yaml(2 hunks)docs/deployment_guide.md(1 hunks)docs/getting_started.md(1 hunks)examples/pyproject.llamastack.toml(1 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: ALWAYS check pyproject.toml for existing dependencies before adding new ones
ALWAYS verify current library versions in pyproject.toml rather than assuming versions
Prefer reading supported Python versions and tool configs from pyproject.toml
Files:
pyproject.toml
🔇 Additional comments (6)
docs/deployment_guide.md (1)
679-679: LGTM - Documentation version updated consistently.The llama-stack version bump to 0.2.23 aligns with the changes in pyproject.toml and other documentation files.
docs/getting_started.md (1)
27-27: LGTM - Installation documentation updated consistently.The version bump to llama-stack 0.2.23 in the installation instructions matches the dependency changes in pyproject.toml.
examples/pyproject.llamastack.toml (1)
7-7: LGTM - Example configuration updated consistently.The llama-stack version in the example configuration correctly matches the version bump to 0.2.23 across the project.
README.md (1)
248-248: LGTM - README example updated consistently.The llama-stack version in the README example configuration correctly reflects the version bump to 0.2.23.
pyproject.toml (1)
31-32: Versions verified; bump approved
Both llama-stack==0.2.23 and llama-stack-client==0.2.23 exist on PyPI and are not affected by any known security advisories (the sole advisory targeted <0.2.20, patched in 0.2.20).docker-compose.yaml (1)
10-10: Confirm image config mount path and run.yaml compatibility
Volume mount updated to/opt/app-root/run.yaml. Please verify that the llama-stack image (quay.io/opendatahub/llama-stack:rhoai-v2.25-latest) actually loads its configuration from/opt/app-root/run.yaml(e.g. via ENV or ENTRYPOINT), and confirm that therun.yamlin the repo root adheres to the schema expected by llama-stack v0.2.23 to prevent runtime failures.
658ab1e to
b5cad94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
run.yaml (2)
15-15: New files API is fine; ensure client/Docs reflect it.Addition looks correct. Please add a short note in docs and any client code paths that this API is now available, and include minimal usage instructions.
109-116: LocalFS provider: pathing, concurrency, and portability nits.
- Use an env-driven path so containers and RHOAI mounts align; e.g., storage_dir: ${env.LLAMA_FILES_DIR:-.llama/distributions/ollama/files}.
- SQLite under heavy concurrent writes can lock on NFS/overlay volumes; if you expect write load, consider WAL/journal config or a server DB later.
- Verify docker-compose/k8s volumes create these dirs with correct UID/GID.
Proposed tweak:
- storage_dir: .llama/distributions/ollama/files + storage_dir: ${env.LLAMA_FILES_DIR:-.llama/distributions/ollama/files} - db_path: .llama/distributions/ollama/files_metadata.db + db_path: ${env.LLAMA_FILES_META_DB:-.llama/distributions/ollama/files_metadata.db}docs/getting_started.md (2)
27-27: Version bump LGTM; add FAISS note and align with new Files API.
- Good to pin llama-stack==0.2.23. Given PR title mentions FAISS blocking, add a short “If FAISS not available” note with a command to skip vector_io deps or feature-gate that path.
- Consider adding the new files API/provider example to match run.yaml so readers don’t get mismatched configs.
64-64: Stale example output.The sample shows lightspeed-stack==0.2.0. Replace with a placeholder or the current version to avoid confusion.
Suggested change:
- ~ lightspeed-stack==0.2.0 (from file:///tmp/ramdisk/lightspeed-stack) + + lightspeed-stack==<current-version> (from file:///…/lightspeed-stack)examples/pyproject.llamastack.toml (1)
7-7: Dependency bump LGTM; align Python version with docs.Keep 0.2.23. Docs say Python 3.12 or 3.13, but this example pins requires-python = "==3.12.*". Either relax to ">=3.12,<3.14" or update docs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
README.md(1 hunks)docker-compose.yaml(2 hunks)docs/deployment_guide.md(1 hunks)docs/getting_started.md(1 hunks)examples/pyproject.llamastack.toml(1 hunks)pyproject.toml(1 hunks)run.yaml(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- pyproject.toml
- docs/deployment_guide.md
- README.md
- docker-compose.yaml
b5cad94 to
48b902f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
README.md(1 hunks)docker-compose.yaml(2 hunks)docs/deployment_guide.md(1 hunks)docs/getting_started.md(1 hunks)examples/pyproject.llamastack.toml(1 hunks)pyproject.toml(1 hunks)run.yaml(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- docker-compose.yaml
- docs/getting_started.md
- docs/deployment_guide.md
- pyproject.toml
🔇 Additional comments (1)
run.yaml (1)
15-15: Files API wiring looks solid.Adding the
filesAPI and backing it with the inlinelocalfsprovider (storage directory plus SQLite metadata store) mirrors the upstream starter distribution template, so this should just work. Nice alignment with the reference config. (git.kvant.cloud)Also applies to: 109-116
README.md
Outdated
| authors = [] | ||
| dependencies = [ | ||
| "llama-stack==0.2.21", | ||
| "llama-stack==0.2.23", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation now points to an unavailable release.
The README tells users to install llama-stack==0.2.23, but PyPI’s most recent release is still 0.2.22 (published September 16, 2025). Following the doc will fail. Please keep the doc aligned with the latest released version until 0.2.23 is actually published. (pypi.org)
🤖 Prompt for AI Agents
README.md around line 248: the README pins "llama-stack==0.2.23" which is not
yet available on PyPI; update the documented requirement to the latest released
version "llama-stack==0.2.22" (or change to a permissive range like
"llama-stack>=0.2.22,<0.3.0") so installs succeed until 0.2.23 is published, and
keep the comment or changelog note to revert to 0.2.23 when it is officially
released.
b79613c to
b4db84f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
README.md (1)
248-265: Doc points to unreleased llama-stack 0.2.23; pin to 0.2.22 or a range.PyPI latest is 0.2.22 (uploaded Sep 16, 2025). Update the example so installs don’t fail until 0.2.23 is published.
- "llama-stack==0.2.23", + "llama-stack==0.2.22", +# or, to allow forward updates when available: +# "llama-stack>=0.2.22,<0.3.0",Source: PyPI release listing shows 0.2.22 as the most recent. (pypi.org)
🧹 Nitpick comments (3)
Dockerfile.rh-llama-stack (1)
2-2: Pin the base image (avoid floating tag).rhoai-v2.25-latest is mutable; pin to a version or digest for reproducible builds and safer rollbacks.
-FROM quay.io/opendatahub/llama-stack:rhoai-v2.25-latest +# Example (replace with the actual digest you validated): +FROM quay.io/opendatahub/llama-stack@sha256:<digest-for-rhoai-v2.25>docker-compose.yaml (2)
7-7: Platform forced to amd64; verify need on Apple Silicon.Hard-coding linux/amd64 forces QEMU emulation on arm64 hosts (slow). If not strictly required (e.g., faiss wheel/BLAS constraints), consider removing or gating via build args.
20-24: Harden healthcheck and extend warmup window.Use CMD-SHELL with curl|wget fallback and lengthen start_period for model/provider init. If /v1/health is not stable, consider /openapi.json or /v1/models.
- healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"] - interval: 10s - timeout: 5s - retries: 3 - start_period: 30s + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || curl -fsS http://localhost:8321/openapi.json >/dev/null || wget -qO- http://localhost:8321/v1/health >/dev/null || wget -qO- http://localhost:8321/openapi.json >/dev/null"] + interval: 10s + timeout: 5s + retries: 6 + start_period: 120sGeneral references: Llama Stack exposes OpenAI-compatible APIs under /v1 (e.g., /v1/models), and readiness endpoints vary by stack; allowing a longer warmup avoids false negatives. (llamastack.github.io)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
Dockerfile.rh-llama-stack(1 hunks)README.md(1 hunks)docker-compose.yaml(2 hunks)docs/deployment_guide.md(1 hunks)docs/getting_started.md(1 hunks)examples/pyproject.llamastack.toml(1 hunks)pyproject.toml(1 hunks)run.yaml(2 hunks)test.containerfile(0 hunks)
💤 Files with no reviewable changes (1)
- test.containerfile
🚧 Files skipped from review as they are similar to previous changes (5)
- examples/pyproject.llamastack.toml
- pyproject.toml
- run.yaml
- docs/getting_started.md
- docs/deployment_guide.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-pr
🔇 Additional comments (1)
docker-compose.yaml (1)
12-12: Confirm llama-stack reads /opt/app-root/run.yaml.Previous examples often referenced /app-root/run.yaml; ensure the base image’s entrypoint uses /opt/app-root/run.yaml or pass an explicit command/arg to point at the mounted path.
If unsure, you can override command to be explicit:
command: ["llama", "stack", "run", "/opt/app-root/run.yaml"]Relevant docs discuss run.yaml as the server config but don’t mandate a fixed path. (llama-stack.readthedocs.io)
| USER root | ||
| RUN pip install faiss-cpu==1.11.0 | ||
|
|
||
| # Switch back to the original user | ||
| USER 1001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faiss-cpu version: prefer 1.12.0 (cp312 wheels available) and add no-cache.
1.11.0 will work, but 1.12.0 is the current release with manylinux cp312 wheels; consider upgrading or at least using a compatible range. Also add --no-cache-dir to shrink layers.
-USER root
-RUN pip install faiss-cpu==1.11.0
+USER root
+# Smaller, up-to-date; adjust if you must stay on a fixed version
+RUN pip install --no-cache-dir 'faiss-cpu==1.12.0'PyPI shows llama_stack at 0.2.22 (latest as of Sep 16, 2025) and faiss-cpu 1.12.0 wheels for CPython 3.12 uploaded Aug 13, 2025. (pypi.org)
🤖 Prompt for AI Agents
In Dockerfile.rh-llama-stack around lines 5 to 9, the Dockerfile pins faiss-cpu
to 1.11.0 and does not use pip's --no-cache-dir; update the pip install to use a
modern compatible version (prefer faiss-cpu==1.12.0 or a range like
faiss-cpu>=1.12.0,<1.13.0) and add --no-cache-dir to the pip install command to
avoid caching wheel files in the image, then keep the USER 1001 switch as-is.
2516762 to
d237ce4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docker-compose.yaml (1)
20-24: Harden healthcheck and allow longer startup for model load.Use CMD-SHELL with curl/wget fallback and increase start_period/retries to avoid flapping on cold starts.
- healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"] - interval: 10s # how often to run the check - timeout: 5s # how long to wait before considering it failed - retries: 3 # how many times to retry before marking as unhealthy - start_period: 15s # time to wait before starting checks + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || wget -qO- http://localhost:8321/v1/health >/dev/null"] + interval: 10s + timeout: 5s + retries: 6 + start_period: 120s
🧹 Nitpick comments (4)
docker-compose.yaml (4)
7-7: Make platform overrideable to support ARM hosts and CI.Hardcoding linux/amd64 forces emulation on Apple Silicon and some CI runners. Allow override via env.
- platform: linux/amd64 + platform: ${PLATFORM:-linux/amd64}
12-12: Confirm run.yaml path; mount read-only if not mutated.Verify llama-stack reads /opt/app-root/run.yaml. If the file isn’t written by the container, mount it ro for safety.
- - ./run.yaml:/opt/app-root/run.yaml:Z + - ./run.yaml:/opt/app-root/run.yaml:ro,Z
15-16: Handle API keys via .env/secrets; fail fast if missing.Empty defaults can hide misconfig. Prefer .env with required expansion or Docker secrets.
Option A (fail fast):
- - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-} - - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:-} + - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:?set in .env} + - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:?set in .env}Option B: keep defaults but document .env.sample.
43-47: Align healthcheck style for consistency and robustness.Optional: switch to CMD-SHELL with -fsS and a slightly higher start_period to reduce false negatives at boot.
- healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:8080/liveness"] - interval: 10s # how often to run the check - timeout: 5s # how long to wait before considering it failed - retries: 3 # how many times to retry before marking as unhealthy - start_period: 5s # time to wait before starting checks + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8080/liveness || wget -qO- http://localhost:8080/liveness >/dev/null"] + interval: 10s + timeout: 5s + retries: 3 + start_period: 15s
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docker-compose.yaml(2 hunks)test.containerfile(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test.containerfile
- Add test.containerfile to extend Red Hat image with FAISS - Update docker-compose.yaml to use Red Hat llama-stack image - Configure platform and volume paths for Red Hat image compatibility - Fix Containerfile build issue by temporarily disabling uv installation in final stage
d237ce4 to
a523095
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
test.containerfile (1)
5-9: Address the previous review comment: upgrade faiss-cpu and add --no-cache-dir.The previous review comment regarding faiss-cpu version and --no-cache-dir flag remains unaddressed. Consider upgrading to faiss-cpu 1.12.0 (which has cp312 wheels available) and adding --no-cache-dir to reduce image size.
Apply this diff:
USER root -RUN pip install faiss-cpu==1.11.0 && \ +RUN pip install --no-cache-dir faiss-cpu==1.12.0 && \ mkdir -p /opt/app-root/.llama/distributions/ollama && \ chown -R 1001:0 /opt/app-root/.llama && \ chmod -R g+rwx /opt/app-root/.llama
🧹 Nitpick comments (1)
test.containerfile (1)
1-2: Pin base image to specific digest
Replace the:latesttag in theFROM quay.io/opendatahub/llama-stack:rhoai-v2.25-latestline with@sha256:f8acb6064989d298d5ac83e639e69532378e4a3320191b85ca42f8f421d758cdto ensure reproducible builds.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
Containerfile(1 hunks)docker-compose.yaml(2 hunks)test.containerfile(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- Containerfile
🚧 Files skipped from review as they are similar to previous changes (1)
- docker-compose.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-pr
🔇 Additional comments (2)
test.containerfile (2)
11-12: LGTM!The working directory is correctly set to
/opt/app-root, which is consistent with OpenShift conventions.
14-15: LGTM!Correctly switches back to non-root user (UID 1001) after completing privileged operations, following container security best practices.
radofuchs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary by CodeRabbit