Skip to content

Conversation

@gallettilance
Copy link
Contributor

@gallettilance gallettilance commented Sep 9, 2025

  • Upgraded to llama stack 0.2.22
  • Using rhoia lls image

Summary by CodeRabbit

  • Chores
    • Declared runtime platform (linux/amd64) and adjusted a runtime mount path.
    • Added BRAVE_SEARCH_API_KEY and TAVILY_SEARCH_API_KEY environment variables (empty by default).
    • Switched to a prebuilt base image and removed several local build/setup steps.
    • Installed faiss-cpu and performed package installation as root, then returned to a non-root runtime.
    • Temporarily disabled a final-stage installation step for a utility due to environment issues.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

Walkthrough

Adds platform linux/amd64 and two empty env vars to the llama-stack service in docker-compose, moves the mounted run.yaml path, replaces the test.containerfile base image and installs faiss-cpu as root before switching to UID 1001, and disables uv installation in the Containerfile final stage.

Changes

Cohort / File(s) Summary
Compose: llama-stack service
docker-compose.yaml
Adds platform: linux/amd64; changes volume mapping for run.yaml to /opt/app-root/run.yaml:Z; adds BRAVE_SEARCH_API_KEY and TAVILY_SEARCH_API_KEY env vars (empty defaults); adds a header comment.
Test image: base and deps
test.containerfile
Replaces base image with quay.io/opendatahub/llama-stack:rhoai-v2.25-latest; installs faiss-cpu==1.11.0 via pip as root; creates/adjusts /app-root ownership/permissions; sets USER 1001 for runtime; removes prior multi-stage/setup steps.
Final image: uv install disabled
Containerfile
Comments out/disables pip3.12 install "uv==0.8.15" in the final stage with a note about temporary temp-directory issues (builder-stage installation remains unchanged in context).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer
  participant DC as docker-compose
  participant BuilderImg as test.containerfile build
  participant FinalImg as final Containerfile
  participant LS as llama-stack container

  Dev->>DC: docker-compose up llama-stack
  DC->>LS: start service (platform: linux/amd64)
  DC->>LS: mount `/opt/app-root/run.yaml:Z`
  DC->>LS: set env BRAVE_SEARCH_API_KEY, TAVILY_SEARCH_API_KEY

  Note over BuilderImg: Base image -> quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
  BuilderImg->>BuilderImg: pip install faiss-cpu (as root)
  BuilderImg->>LS: image runs as USER 1001

  Note over FinalImg: final-stage `uv` install commented out\n(no active pip install in final image)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • tisnik

Poem

I hop through Docker, light and spry,
Moved a yaml and left uv shy.
Faiss snug in my little pack,
Keys blank, permissions set back.
Rabbit nudge — containers fly! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly identifies the ticket number and the primary change of switching to the rhoai LLS image, which matches the main modifications in the pull request without extraneous detail.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 168921a and be6f10d.

📒 Files selected for processing (1)
  • test.containerfile (1 hunks)
🔇 Additional comments (2)
test.containerfile (2)

2-2: Verify the base image tag strategy for reproducibility.

The base image tag rhoai-v2.25-latest combines a version identifier with latest, which may lead to non-deterministic builds if the image is updated. Consider using a digest-based reference (e.g., @sha256:...) or a fully-qualified version tag to ensure build reproducibility.

If this is the standard tagging convention for Red Hat OpenShift AI images and updates are managed externally, this can be acceptable. However, documenting this decision would be helpful.


7-12: LGTM: Directory setup and user switch follow OpenShift best practices.

The directory creation with ownership 1001:0 and permissions 775, along with switching back to the non-root user, correctly follows OpenShift security conventions.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gallettilance
Copy link
Contributor Author

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
docker-compose.yaml (5)

3-16: GPU reservation missing for vLLM.

If you intend GPU inference, add an NVIDIA device reservation (or document CPU mode explicitly). The official docs show running with --gpus all; compose equivalent shown below. (docs.vllm.ai)

   vllm:
@@
     networks:
       - lightspeednet
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]

Please confirm your target environment (Docker Compose v2 on a non-Swarm host) supports device reservations; otherwise we can switch to an alternative (e.g., runtime: nvidia or explicit /dev mappings).


5-5: Avoid forcing linux/amd64 platform.

Hard-coding amd64 will trigger emulation on ARM hosts (e.g., Apple Silicon), hurting performance. Let Docker choose, or gate via env.

-    platform: linux/amd64
+    # platform: linux/amd64  # uncomment only if you must force x86_64

Also applies to: 27-27


47-51: Healthcheck OK; consider curl fallback and slightly longer start period.

Minor robustness tweak mirroring vLLM change; optional.

     healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"]
-      interval: 10s
-      timeout: 5s
-      retries: 3
-      start_period: 15s
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || wget -qO- http://localhost:8321/v1/health >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 6
+      start_period: 30s

36-36: Trailing whitespace.

Lint error at Line 36; drop trailing spaces.

-      
+

6-6: container_name is optional and can cause cross-project collisions.

Recommend omitting unless you rely on fixed names.

-    container_name: vllm
+    # container_name: vllm

-    container_name: llama-stack
+    # container_name: llama-stack

Also applies to: 28-28

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 528e0b9 and bcf3056.

📒 Files selected for processing (1)
  • docker-compose.yaml (2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
docker-compose.yaml

[error] 36-36: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (2)
docker-compose.yaml (2)

32-41: Env wiring for Llama Stack ↔ vLLM looks correct.

VLLM_URL and INFERENCE_MODEL match the remote vLLM distribution’s documented envs. Keep defaults consistent with the vLLM service. Health endpoint is under /v1/health. (llama-stack.readthedocs.io, infohub.delltechnologies.com)

If you want, I can add LLAMA_STACK_PORT with a default to align with docs:

     environment:
       # Standalone mode configuration
       - STANDALONE=true
       - VLLM_URL=http://vllm:8000/v1
       - INFERENCE_MODEL=${INFERENCE_MODEL:-microsoft/DialoGPT-small}
+      - LLAMA_STACK_PORT=${LLAMA_STACK_PORT:-8321}

13-16: Good call on caching HF models with a named volume.

This will avoid re-downloading weights on container restarts.

Also applies to: 76-78

services:
# vLLM service for Red Hat distribution
vllm:
image: vllm/vllm-openai:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pin images to versions/digests instead of floating tags.

Using latest/odh introduces non-reproducible builds and potential breakages. Pin to a known tag or, preferably, a digest.

-    image: vllm/vllm-openai:latest
+    image: vllm/vllm-openai:v0.9.2 # or pin by digest

-    image: quay.io/opendatahub/llama-stack:odh
+    image: quay.io/opendatahub/llama-stack:odh-<version> # or pin by digest

Also applies to: 26-26

🤖 Prompt for AI Agents
In docker-compose.yaml around lines 4 and 26, the image: vllm/vllm-openai:latest
uses a floating tag; replace it with a pinned version or, preferably, an
immutable digest (e.g., image: vllm/vllm-openai:<semver> or image:
vllm/vllm-openai@sha256:<digest>) to ensure reproducible builds and update both
occurrences at lines 4 and 26 accordingly.

Comment on lines 9 to 12
environment:
- MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small}
- HOST=0.0.0.0
- PORT=8000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

vLLM does not read MODEL_NAME/HOST/PORT envs; pass CLI args instead.

The vllm/vllm-openai image expects --model/--host/--port as arguments (envs like HOST/PORT are not used by the API server). Replace these envs with command; optionally add shm_size/ipc for PyTorch shared memory. (docs.vllm.ai)

     environment:
-      - MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small}
-      - HOST=0.0.0.0
-      - PORT=8000
+      - HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN:-}
+    command:
+      - --host
+      - 0.0.0.0
+      - --port
+      - "8000"
+      - --model
+      - ${INFERENCE_MODEL:-microsoft/DialoGPT-small}
+    shm_size: "2g"
+    # or: ipc: host
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
environment:
- MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small}
- HOST=0.0.0.0
- PORT=8000
environment:
- HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN:-}
command:
- --host
- 0.0.0.0
- --port
- "8000"
- --model
- ${INFERENCE_MODEL:-microsoft/DialoGPT-small}
shm_size: "2g"
# or: ipc: host
🤖 Prompt for AI Agents
In docker-compose.yaml around lines 9 to 12, the compose currently sets
MODEL_NAME/HOST/PORT as environment variables but vLLM ignores those and
requires CLI args; replace the environment entries with a command array that
passes --model ${INFERENCE_MODEL:-microsoft/DialoGPT-small} --host 0.0.0.0
--port 8000 to the container, remove the HOST/PORT envs, and optionally add
docker-compose-level settings like shm_size: "1gb" and ipc: host to support
PyTorch shared memory.

Comment on lines 17 to 22
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Hardening healthcheck and allowing longer model load time.

  • curl may be absent; use CMD-SHELL with -fsS.
  • Big models often need more than 30s; bump start_period.
    vLLM’s health endpoint is /health. (docs.vllm.ai)
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
-      interval: 10s
-      timeout: 5s
-      retries: 3
-      start_period: 30s
+    healthcheck:
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8000/health || wget -qO- http://localhost:8000/health >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 6
+      start_period: 120s
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 30s
healthcheck:
test: ["CMD-SHELL", "curl -fsS http://localhost:8000/health || wget -qO- http://localhost:8000/health >/dev/null"]
interval: 10s
timeout: 5s
retries: 6
start_period: 120s
🤖 Prompt for AI Agents
In docker-compose.yaml around lines 17 to 22, the healthcheck uses the exec form
with plain curl and a 30s start_period which can fail if curl is missing or
model loading takes longer; change to the shell form using CMD-SHELL and curl
-fsS (so failure is detected and curl absence is handled by the shell) and
increase start_period (for example to 120s or an appropriate longer value) to
allow big models more time to load; keep the endpoint /health and optionally
adjust interval/timeout/retries if needed for your environment.

@gallettilance gallettilance changed the title (spike) using rhoai lls LCORE-651: using rhoai lls image (blocked by faiss dep) Sep 30, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docker-compose.yaml (1)

4-5: Pin image to immutable digest and verify platform constraint necessity.

Apply this diff:

-    image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
+    image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest@sha256:f8acb6064989d298d5ac83e639e69532378e4a3320191b85ca42f8f421d758cd

Verify whether the platform: linux/amd64 line is required—if the image is multi-arch, remove it to allow Docker to auto-select the host architecture.

🧹 Nitpick comments (3)
docker-compose.yaml (3)

17-22: Consider hardening healthcheck for robustness.

The 30s start_period is a reasonable improvement. For additional robustness, consider using CMD-SHELL to handle cases where curl might be unavailable:

     healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"]
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || exit 1"]
       interval: 10s
       timeout: 5s
       retries: 3
       start_period: 30s

The -s flag silences progress, -S shows errors, and -f fails on HTTP errors. This provides clearer failure signals in logs.


49-49: Optional: Remove redundant network driver specification.

The driver: bridge is the default for Docker Compose networks, so this line is redundant unless there's a specific reason to make it explicit for documentation purposes.

 networks:
   lightspeednet:
-    driver: bridge

If the explicit specification is intentional for clarity, consider adding a comment explaining why.


13-14: Document optional nature and usage of new API keys

Add inline comments in docker-compose.yaml to clarify that these keys are only required when their provider is enabled:

     environment:
       - OPENAI_API_KEY=${OPENAI_API_KEY}
+      # Optional: Brave Search API key (required if using brave-search provider)
       - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-}
+      # Optional: Tavily Search API key (required if using tavily-search provider)
       - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:-}

And update your documentation to note that BRAVE_SEARCH_API_KEY and TAVILY_SEARCH_API_KEY are plain-string API tokens used only when their respective providers are selected; otherwise they may be omitted.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcf3056 and 658ab1e.

📒 Files selected for processing (6)
  • README.md (1 hunks)
  • docker-compose.yaml (2 hunks)
  • docs/deployment_guide.md (1 hunks)
  • docs/getting_started.md (1 hunks)
  • examples/pyproject.llamastack.toml (1 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: ALWAYS check pyproject.toml for existing dependencies before adding new ones
ALWAYS verify current library versions in pyproject.toml rather than assuming versions
Prefer reading supported Python versions and tool configs from pyproject.toml

Files:

  • pyproject.toml
🔇 Additional comments (6)
docs/deployment_guide.md (1)

679-679: LGTM - Documentation version updated consistently.

The llama-stack version bump to 0.2.23 aligns with the changes in pyproject.toml and other documentation files.

docs/getting_started.md (1)

27-27: LGTM - Installation documentation updated consistently.

The version bump to llama-stack 0.2.23 in the installation instructions matches the dependency changes in pyproject.toml.

examples/pyproject.llamastack.toml (1)

7-7: LGTM - Example configuration updated consistently.

The llama-stack version in the example configuration correctly matches the version bump to 0.2.23 across the project.

README.md (1)

248-248: LGTM - README example updated consistently.

The llama-stack version in the README example configuration correctly reflects the version bump to 0.2.23.

pyproject.toml (1)

31-32: Versions verified; bump approved
Both llama-stack==0.2.23 and llama-stack-client==0.2.23 exist on PyPI and are not affected by any known security advisories (the sole advisory targeted <0.2.20, patched in 0.2.20).

docker-compose.yaml (1)

10-10: Confirm image config mount path and run.yaml compatibility
Volume mount updated to /opt/app-root/run.yaml. Please verify that the llama-stack image (quay.io/opendatahub/llama-stack:rhoai-v2.25-latest) actually loads its configuration from /opt/app-root/run.yaml (e.g. via ENV or ENTRYPOINT), and confirm that the run.yaml in the repo root adheres to the schema expected by llama-stack v0.2.23 to prevent runtime failures.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
run.yaml (2)

15-15: New files API is fine; ensure client/Docs reflect it.

Addition looks correct. Please add a short note in docs and any client code paths that this API is now available, and include minimal usage instructions.


109-116: LocalFS provider: pathing, concurrency, and portability nits.

  • Use an env-driven path so containers and RHOAI mounts align; e.g., storage_dir: ${env.LLAMA_FILES_DIR:-.llama/distributions/ollama/files}.
  • SQLite under heavy concurrent writes can lock on NFS/overlay volumes; if you expect write load, consider WAL/journal config or a server DB later.
  • Verify docker-compose/k8s volumes create these dirs with correct UID/GID.

Proposed tweak:

-        storage_dir: .llama/distributions/ollama/files
+        storage_dir: ${env.LLAMA_FILES_DIR:-.llama/distributions/ollama/files}
-          db_path: .llama/distributions/ollama/files_metadata.db
+          db_path: ${env.LLAMA_FILES_META_DB:-.llama/distributions/ollama/files_metadata.db}
docs/getting_started.md (2)

27-27: Version bump LGTM; add FAISS note and align with new Files API.

  • Good to pin llama-stack==0.2.23. Given PR title mentions FAISS blocking, add a short “If FAISS not available” note with a command to skip vector_io deps or feature-gate that path.
  • Consider adding the new files API/provider example to match run.yaml so readers don’t get mismatched configs.

64-64: Stale example output.

The sample shows lightspeed-stack==0.2.0. Replace with a placeholder or the current version to avoid confusion.

Suggested change:

-     ~ lightspeed-stack==0.2.0 (from file:///tmp/ramdisk/lightspeed-stack)
+     + lightspeed-stack==<current-version> (from file:///…/lightspeed-stack)
examples/pyproject.llamastack.toml (1)

7-7: Dependency bump LGTM; align Python version with docs.

Keep 0.2.23. Docs say Python 3.12 or 3.13, but this example pins requires-python = "==3.12.*". Either relax to ">=3.12,<3.14" or update docs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 658ab1e and b5cad94.

📒 Files selected for processing (7)
  • README.md (1 hunks)
  • docker-compose.yaml (2 hunks)
  • docs/deployment_guide.md (1 hunks)
  • docs/getting_started.md (1 hunks)
  • examples/pyproject.llamastack.toml (1 hunks)
  • pyproject.toml (1 hunks)
  • run.yaml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • pyproject.toml
  • docs/deployment_guide.md
  • README.md
  • docker-compose.yaml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5cad94 and 48b902f.

📒 Files selected for processing (7)
  • README.md (1 hunks)
  • docker-compose.yaml (2 hunks)
  • docs/deployment_guide.md (1 hunks)
  • docs/getting_started.md (1 hunks)
  • examples/pyproject.llamastack.toml (1 hunks)
  • pyproject.toml (1 hunks)
  • run.yaml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • docker-compose.yaml
  • docs/getting_started.md
  • docs/deployment_guide.md
  • pyproject.toml
🔇 Additional comments (1)
run.yaml (1)

15-15: Files API wiring looks solid.

Adding the files API and backing it with the inline localfs provider (storage directory plus SQLite metadata store) mirrors the upstream starter distribution template, so this should just work. Nice alignment with the reference config. (git.kvant.cloud)

Also applies to: 109-116

README.md Outdated
authors = []
dependencies = [
"llama-stack==0.2.21",
"llama-stack==0.2.23",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Documentation now points to an unavailable release.

The README tells users to install llama-stack==0.2.23, but PyPI’s most recent release is still 0.2.22 (published September 16, 2025). Following the doc will fail. Please keep the doc aligned with the latest released version until 0.2.23 is actually published. (pypi.org)

🤖 Prompt for AI Agents
README.md around line 248: the README pins "llama-stack==0.2.23" which is not
yet available on PyPI; update the documented requirement to the latest released
version "llama-stack==0.2.22" (or change to a permissive range like
"llama-stack>=0.2.22,<0.3.0") so installs succeed until 0.2.23 is published, and
keep the comment or changelog note to revert to 0.2.23 when it is officially
released.

@gallettilance gallettilance force-pushed the rhoai-lls branch 2 times, most recently from b79613c to b4db84f Compare October 1, 2025 14:41
@gallettilance gallettilance changed the title LCORE-651: using rhoai lls image (blocked by faiss dep) LCORE-651: using rhoai lls image Oct 1, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
README.md (1)

248-265: Doc points to unreleased llama-stack 0.2.23; pin to 0.2.22 or a range.

PyPI latest is 0.2.22 (uploaded Sep 16, 2025). Update the example so installs don’t fail until 0.2.23 is published.

-    "llama-stack==0.2.23",
+    "llama-stack==0.2.22",
+# or, to allow forward updates when available:
+#   "llama-stack>=0.2.22,<0.3.0",

Source: PyPI release listing shows 0.2.22 as the most recent. (pypi.org)

🧹 Nitpick comments (3)
Dockerfile.rh-llama-stack (1)

2-2: Pin the base image (avoid floating tag).

rhoai-v2.25-latest is mutable; pin to a version or digest for reproducible builds and safer rollbacks.

-FROM quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
+# Example (replace with the actual digest you validated):
+FROM quay.io/opendatahub/llama-stack@sha256:<digest-for-rhoai-v2.25>
docker-compose.yaml (2)

7-7: Platform forced to amd64; verify need on Apple Silicon.

Hard-coding linux/amd64 forces QEMU emulation on arm64 hosts (slow). If not strictly required (e.g., faiss wheel/BLAS constraints), consider removing or gating via build args.


20-24: Harden healthcheck and extend warmup window.

Use CMD-SHELL with curl|wget fallback and lengthen start_period for model/provider init. If /v1/health is not stable, consider /openapi.json or /v1/models.

-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"]
-      interval: 10s
-      timeout: 5s
-      retries: 3
-      start_period: 30s
+    healthcheck:
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || curl -fsS http://localhost:8321/openapi.json >/dev/null || wget -qO- http://localhost:8321/v1/health >/dev/null || wget -qO- http://localhost:8321/openapi.json >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 6
+      start_period: 120s

General references: Llama Stack exposes OpenAI-compatible APIs under /v1 (e.g., /v1/models), and readiness endpoints vary by stack; allowing a longer warmup avoids false negatives. (llamastack.github.io)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b902f and b4db84f.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • Dockerfile.rh-llama-stack (1 hunks)
  • README.md (1 hunks)
  • docker-compose.yaml (2 hunks)
  • docs/deployment_guide.md (1 hunks)
  • docs/getting_started.md (1 hunks)
  • examples/pyproject.llamastack.toml (1 hunks)
  • pyproject.toml (1 hunks)
  • run.yaml (2 hunks)
  • test.containerfile (0 hunks)
💤 Files with no reviewable changes (1)
  • test.containerfile
🚧 Files skipped from review as they are similar to previous changes (5)
  • examples/pyproject.llamastack.toml
  • pyproject.toml
  • run.yaml
  • docs/getting_started.md
  • docs/deployment_guide.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-pr
🔇 Additional comments (1)
docker-compose.yaml (1)

12-12: Confirm llama-stack reads /opt/app-root/run.yaml.

Previous examples often referenced /app-root/run.yaml; ensure the base image’s entrypoint uses /opt/app-root/run.yaml or pass an explicit command/arg to point at the mounted path.

If unsure, you can override command to be explicit:

command: ["llama", "stack", "run", "/opt/app-root/run.yaml"]

Relevant docs discuss run.yaml as the server config but don’t mandate a fixed path. (llama-stack.readthedocs.io)

Comment on lines 5 to 9
USER root
RUN pip install faiss-cpu==1.11.0

# Switch back to the original user
USER 1001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

faiss-cpu version: prefer 1.12.0 (cp312 wheels available) and add no-cache.

1.11.0 will work, but 1.12.0 is the current release with manylinux cp312 wheels; consider upgrading or at least using a compatible range. Also add --no-cache-dir to shrink layers.

-USER root
-RUN pip install faiss-cpu==1.11.0
+USER root
+# Smaller, up-to-date; adjust if you must stay on a fixed version
+RUN pip install --no-cache-dir 'faiss-cpu==1.12.0'

PyPI shows llama_stack at 0.2.22 (latest as of Sep 16, 2025) and faiss-cpu 1.12.0 wheels for CPython 3.12 uploaded Aug 13, 2025. (pypi.org)

🤖 Prompt for AI Agents
In Dockerfile.rh-llama-stack around lines 5 to 9, the Dockerfile pins faiss-cpu
to 1.11.0 and does not use pip's --no-cache-dir; update the pip install to use a
modern compatible version (prefer faiss-cpu==1.12.0 or a range like
faiss-cpu>=1.12.0,<1.13.0) and add --no-cache-dir to the pip install command to
avoid caching wheel files in the image, then keep the USER 1001 switch as-is.

@gallettilance gallettilance force-pushed the rhoai-lls branch 7 times, most recently from 2516762 to d237ce4 Compare October 7, 2025 14:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docker-compose.yaml (1)

20-24: Harden healthcheck and allow longer startup for model load.

Use CMD-SHELL with curl/wget fallback and increase start_period/retries to avoid flapping on cold starts.

-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"]
-      interval: 10s   # how often to run the check
-      timeout: 5s    # how long to wait before considering it failed
-      retries: 3      # how many times to retry before marking as unhealthy
-      start_period: 15s # time to wait before starting checks
+    healthcheck:
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || wget -qO- http://localhost:8321/v1/health >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 6
+      start_period: 120s
🧹 Nitpick comments (4)
docker-compose.yaml (4)

7-7: Make platform overrideable to support ARM hosts and CI.

Hardcoding linux/amd64 forces emulation on Apple Silicon and some CI runners. Allow override via env.

-    platform: linux/amd64
+    platform: ${PLATFORM:-linux/amd64}

12-12: Confirm run.yaml path; mount read-only if not mutated.

Verify llama-stack reads /opt/app-root/run.yaml. If the file isn’t written by the container, mount it ro for safety.

-      - ./run.yaml:/opt/app-root/run.yaml:Z
+      - ./run.yaml:/opt/app-root/run.yaml:ro,Z

15-16: Handle API keys via .env/secrets; fail fast if missing.

Empty defaults can hide misconfig. Prefer .env with required expansion or Docker secrets.

Option A (fail fast):

-      - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-}
-      - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:-}
+      - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:?set in .env}
+      - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:?set in .env}

Option B: keep defaults but document .env.sample.


43-47: Align healthcheck style for consistency and robustness.

Optional: switch to CMD-SHELL with -fsS and a slightly higher start_period to reduce false negatives at boot.

-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8080/liveness"]
-      interval: 10s   # how often to run the check
-      timeout: 5s    # how long to wait before considering it failed
-      retries: 3      # how many times to retry before marking as unhealthy
-      start_period: 5s # time to wait before starting checks
+    healthcheck:
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8080/liveness || wget -qO- http://localhost:8080/liveness >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 3
+      start_period: 15s
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b60257 and d237ce4.

📒 Files selected for processing (2)
  • docker-compose.yaml (2 hunks)
  • test.containerfile (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test.containerfile

- Add test.containerfile to extend Red Hat image with FAISS
- Update docker-compose.yaml to use Red Hat llama-stack image
- Configure platform and volume paths for Red Hat image compatibility
- Fix Containerfile build issue by temporarily disabling uv installation in final stage
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
test.containerfile (1)

5-9: Address the previous review comment: upgrade faiss-cpu and add --no-cache-dir.

The previous review comment regarding faiss-cpu version and --no-cache-dir flag remains unaddressed. Consider upgrading to faiss-cpu 1.12.0 (which has cp312 wheels available) and adding --no-cache-dir to reduce image size.

Apply this diff:

 USER root
-RUN pip install faiss-cpu==1.11.0 && \
+RUN pip install --no-cache-dir faiss-cpu==1.12.0 && \
     mkdir -p /opt/app-root/.llama/distributions/ollama && \
     chown -R 1001:0 /opt/app-root/.llama && \
     chmod -R g+rwx /opt/app-root/.llama
🧹 Nitpick comments (1)
test.containerfile (1)

1-2: Pin base image to specific digest
Replace the :latest tag in the FROM quay.io/opendatahub/llama-stack:rhoai-v2.25-latest line with @sha256:f8acb6064989d298d5ac83e639e69532378e4a3320191b85ca42f8f421d758cd to ensure reproducible builds.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d237ce4 and 168921a.

📒 Files selected for processing (3)
  • Containerfile (1 hunks)
  • docker-compose.yaml (2 hunks)
  • test.containerfile (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • Containerfile
🚧 Files skipped from review as they are similar to previous changes (1)
  • docker-compose.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-pr
🔇 Additional comments (2)
test.containerfile (2)

11-12: LGTM!

The working directory is correctly set to /opt/app-root, which is consistent with OpenShift conventions.


14-15: LGTM!

Correctly switches back to non-root user (UID 1001) after completing privileged operations, following container security best practices.

Copy link
Contributor

@radofuchs radofuchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit fbc571a into lightspeed-core:main Oct 9, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants