LCORE-651: using rhoai lls image #517

gallettilance · 2025-09-09T12:29:38Z

Upgraded to llama stack 0.2.22
Using rhoia lls image

Summary by CodeRabbit

Chores
- Declared runtime platform (linux/amd64) and adjusted a runtime mount path.
- Added BRAVE_SEARCH_API_KEY and TAVILY_SEARCH_API_KEY environment variables (empty by default).
- Switched to a prebuilt base image and removed several local build/setup steps.
- Installed faiss-cpu and performed package installation as root, then returned to a non-root runtime.
- Temporarily disabled a final-stage installation step for a utility due to environment issues.

coderabbitai · 2025-09-09T12:29:45Z

Walkthrough

Adds platform linux/amd64 and two empty env vars to the llama-stack service in docker-compose, moves the mounted run.yaml path, replaces the test.containerfile base image and installs faiss-cpu as root before switching to UID 1001, and disables uv installation in the Containerfile final stage.

Changes

Cohort / File(s)	Summary
Compose: llama-stack service `docker-compose.yaml`	Adds `platform: linux/amd64`; changes volume mapping for `run.yaml` to `/opt/app-root/run.yaml:Z`; adds `BRAVE_SEARCH_API_KEY` and `TAVILY_SEARCH_API_KEY` env vars (empty defaults); adds a header comment.
Test image: base and deps `test.containerfile`	Replaces base image with `quay.io/opendatahub/llama-stack:rhoai-v2.25-latest`; installs `faiss-cpu==1.11.0` via pip as `root`; creates/adjusts `/app-root` ownership/permissions; sets `USER 1001` for runtime; removes prior multi-stage/setup steps.
Final image: uv install disabled `Containerfile`	Comments out/disables `pip3.12 install "uv==0.8.15"` in the final stage with a note about temporary temp-directory issues (builder-stage installation remains unchanged in context).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer
  participant DC as docker-compose
  participant BuilderImg as test.containerfile build
  participant FinalImg as final Containerfile
  participant LS as llama-stack container

  Dev->>DC: docker-compose up llama-stack
  DC->>LS: start service (platform: linux/amd64)
  DC->>LS: mount `/opt/app-root/run.yaml:Z`
  DC->>LS: set env BRAVE_SEARCH_API_KEY, TAVILY_SEARCH_API_KEY

  Note over BuilderImg: Base image -> quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
  BuilderImg->>BuilderImg: pip install faiss-cpu (as root)
  BuilderImg->>LS: image runs as USER 1001

  Note over FinalImg: final-stage `uv` install commented out\n(no active pip install in final image)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

LCORE-497, LCORE-498, LCORE-499: Bundle most of ansible, assist-installer, default run.yaml dependencies, with pytorch and libs that depends on it. #292: Overlaps container build changes and dependency installs (faiss/uv) affecting Containerfile/test.containerfile.
LCORE-431: how to build the lightspeed core image #391: Overlaps run.yaml placement and base image adjustments in test.containerfile.

Suggested reviewers

tisnik

Poem

I hop through Docker, light and spry,
Moved a yaml and left uv shy.
Faiss snug in my little pack,
Keys blank, permissions set back.
Rabbit nudge — containers fly! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly identifies the ticket number and the primary change of switching to the rhoai LLS image, which matches the main modifications in the pull request without extraneous detail.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 168921a and be6f10d.

📒 Files selected for processing (1)

test.containerfile (1 hunks)

🔇 Additional comments (2)

test.containerfile (2)

2-2: Verify the base image tag strategy for reproducibility.

The base image tag rhoai-v2.25-latest combines a version identifier with latest, which may lead to non-deterministic builds if the image is updated. Consider using a digest-based reference (e.g., @sha256:...) or a fully-qualified version tag to ensure build reproducibility.

If this is the standard tagging convention for Red Hat OpenShift AI images and updates are managed externally, this can be acceptable. However, documenting this decision would be helpful.

7-12: LGTM: Directory setup and user switch follow OpenShift best practices.

The directory creation with ownership 1001:0 and permissions 775, along with switching back to the non-root user, correctly follows OpenShift security conventions.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gallettilance · 2025-09-09T12:29:54Z

depends on opendatahub-io/llama-stack-distribution#22

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

docker-compose.yaml (5)
3-16: GPU reservation missing for vLLM.

If you intend GPU inference, add an NVIDIA device reservation (or document CPU mode explicitly). The official docs show running with --gpus all; compose equivalent shown below. (docs.vllm.ai)
   vllm:
@@
     networks:
       - lightspeednet
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
Please confirm your target environment (Docker Compose v2 on a non-Swarm host) supports device reservations; otherwise we can switch to an alternative (e.g., runtime: nvidia or explicit /dev mappings).

5-5: Avoid forcing linux/amd64 platform.

Hard-coding amd64 will trigger emulation on ARM hosts (e.g., Apple Silicon), hurting performance. Let Docker choose, or gate via env.
-    platform: linux/amd64
+    # platform: linux/amd64  # uncomment only if you must force x86_64
Also applies to: 27-27

47-51: Healthcheck OK; consider curl fallback and slightly longer start period.

Minor robustness tweak mirroring vLLM change; optional.
     healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"]
-      interval: 10s
-      timeout: 5s
-      retries: 3
-      start_period: 15s
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || wget -qO- http://localhost:8321/v1/health >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 6
+      start_period: 30s
36-36: Trailing whitespace.

Lint error at Line 36; drop trailing spaces.
-      
+
6-6: container_name is optional and can cause cross-project collisions.

Recommend omitting unless you rely on fixed names.
-    container_name: vllm
+    # container_name: vllm

-    container_name: llama-stack
+    # container_name: llama-stack
Also applies to: 28-28

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 528e0b9 and bcf3056.

📒 Files selected for processing (1)

docker-compose.yaml (2 hunks)

🧰 Additional context used

🪛 YAMLlint (1.37.1)

docker-compose.yaml

[error] 36-36: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests

🔇 Additional comments (2)

docker-compose.yaml (2)
32-41: Env wiring for Llama Stack ↔ vLLM looks correct.

VLLM_URL and INFERENCE_MODEL match the remote vLLM distribution’s documented envs. Keep defaults consistent with the vLLM service. Health endpoint is under /v1/health. (llama-stack.readthedocs.io, infohub.delltechnologies.com)

If you want, I can add LLAMA_STACK_PORT with a default to align with docs:
     environment:
       # Standalone mode configuration
       - STANDALONE=true
       - VLLM_URL=http://vllm:8000/v1
       - INFERENCE_MODEL=${INFERENCE_MODEL:-microsoft/DialoGPT-small}
+      - LLAMA_STACK_PORT=${LLAMA_STACK_PORT:-8321}
13-16: Good call on caching HF models with a named volume.

This will avoid re-downloading weights on container restarts.

Also applies to: 76-78

coderabbitai · 2025-09-09T12:39:43Z

docker-compose.yaml

 services:
+  # vLLM service for Red Hat distribution
+  vllm:
+    image: vllm/vllm-openai:latest


🛠️ Refactor suggestion

Pin images to versions/digests instead of floating tags.

Using latest/odh introduces non-reproducible builds and potential breakages. Pin to a known tag or, preferably, a digest.

- image: vllm/vllm-openai:latest + image: vllm/vllm-openai:v0.9.2 # or pin by digest - image: quay.io/opendatahub/llama-stack:odh + image: quay.io/opendatahub/llama-stack:odh-<version> # or pin by digest

Also applies to: 26-26

🤖 Prompt for AI Agents

In docker-compose.yaml around lines 4 and 26, the image: vllm/vllm-openai:latest uses a floating tag; replace it with a pinned version or, preferably, an immutable digest (e.g., image: vllm/vllm-openai:<semver> or image: vllm/vllm-openai@sha256:<digest>) to ensure reproducible builds and update both occurrences at lines 4 and 26 accordingly.

coderabbitai · 2025-09-09T12:39:43Z

docker-compose.yaml

+    environment:
+      - MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small}
+      - HOST=0.0.0.0
+      - PORT=8000


⚠️ Potential issue

vLLM does not read MODEL_NAME/HOST/PORT envs; pass CLI args instead.

The vllm/vllm-openai image expects --model/--host/--port as arguments (envs like HOST/PORT are not used by the API server). Replace these envs with command; optionally add shm_size/ipc for PyTorch shared memory. (docs.vllm.ai)

environment: - - MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small} - - HOST=0.0.0.0 - - PORT=8000 + - HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN:-} + command: + - --host + - 0.0.0.0 + - --port + - "8000" + - --model + - ${INFERENCE_MODEL:-microsoft/DialoGPT-small} + shm_size: "2g" + # or: ipc: host

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

environment:

- MODEL_NAME=${INFERENCE_MODEL:-microsoft/DialoGPT-small}

- HOST=0.0.0.0

- PORT=8000

environment:

- HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN:-}

command:

- --host

- 0.0.0.0

- --port

- "8000"

- --model

- ${INFERENCE_MODEL:-microsoft/DialoGPT-small}

shm_size: "2g"

# or: ipc: host

🤖 Prompt for AI Agents

In docker-compose.yaml around lines 9 to 12, the compose currently sets MODEL_NAME/HOST/PORT as environment variables but vLLM ignores those and requires CLI args; replace the environment entries with a command array that passes --model ${INFERENCE_MODEL:-microsoft/DialoGPT-small} --host 0.0.0.0 --port 8000 to the container, remove the HOST/PORT envs, and optionally add docker-compose-level settings like shm_size: "1gb" and ipc: host to support PyTorch shared memory.

coderabbitai · 2025-09-09T12:39:43Z

docker-compose.yaml

+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 10s
+      timeout: 5s
+      retries: 3
+      start_period: 30s


🛠️ Refactor suggestion

Hardening healthcheck and allowing longer model load time.

curl may be absent; use CMD-SHELL with -fsS.

Big models often need more than 30s; bump start_period.
vLLM’s health endpoint is /health. (docs.vllm.ai)

- healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:8000/health"] - interval: 10s - timeout: 5s - retries: 3 - start_period: 30s + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://localhost:8000/health || wget -qO- http://localhost:8000/health >/dev/null"] + interval: 10s + timeout: 5s + retries: 6 + start_period: 120s

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

healthcheck:

test: ["CMD", "curl", "-f", "http://localhost:8000/health"]

interval: 10s

timeout: 5s

retries: 3

start_period: 30s

healthcheck:

test: ["CMD-SHELL", "curl -fsS http://localhost:8000/health || wget -qO- http://localhost:8000/health >/dev/null"]

interval: 10s

timeout: 5s

retries: 6

start_period: 120s

🤖 Prompt for AI Agents

In docker-compose.yaml around lines 17 to 22, the healthcheck uses the exec form with plain curl and a 30s start_period which can fail if curl is missing or model loading takes longer; change to the shell form using CMD-SHELL and curl -fsS (so failure is detected and curl absence is handled by the shell) and increase start_period (for example to 120s or an appropriate longer value) to allow big models more time to load; keep the endpoint /health and optionally adjust interval/timeout/retries if needed for your environment.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

docker-compose.yaml (1)
4-5: Pin image to immutable digest and verify platform constraint necessity.

Apply this diff:
-    image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
+    image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest@sha256:f8acb6064989d298d5ac83e639e69532378e4a3320191b85ca42f8f421d758cd
Verify whether the platform: linux/amd64 line is required—if the image is multi-arch, remove it to allow Docker to auto-select the host architecture.

🧹 Nitpick comments (3)

docker-compose.yaml (3)
17-22: Consider hardening healthcheck for robustness.

The 30s start_period is a reasonable improvement. For additional robustness, consider using CMD-SHELL to handle cases where curl might be unavailable:
     healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"]
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || exit 1"]
       interval: 10s
       timeout: 5s
       retries: 3
       start_period: 30s
The -s flag silences progress, -S shows errors, and -f fails on HTTP errors. This provides clearer failure signals in logs.

49-49: Optional: Remove redundant network driver specification.

The driver: bridge is the default for Docker Compose networks, so this line is redundant unless there's a specific reason to make it explicit for documentation purposes.
 networks:
   lightspeednet:
-    driver: bridge
If the explicit specification is intentional for clarity, consider adding a comment explaining why.

13-14: Document optional nature and usage of new API keys

Add inline comments in docker-compose.yaml to clarify that these keys are only required when their provider is enabled:
     environment:
       - OPENAI_API_KEY=${OPENAI_API_KEY}
+      # Optional: Brave Search API key (required if using brave-search provider)
       - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-}
+      # Optional: Tavily Search API key (required if using tavily-search provider)
       - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:-}
And update your documentation to note that BRAVE_SEARCH_API_KEY and TAVILY_SEARCH_API_KEY are plain-string API tokens used only when their respective providers are selected; otherwise they may be omitted.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcf3056 and 658ab1e.

📒 Files selected for processing (6)

README.md (1 hunks)
docker-compose.yaml (2 hunks)
docs/deployment_guide.md (1 hunks)
docs/getting_started.md (1 hunks)
examples/pyproject.llamastack.toml (1 hunks)
pyproject.toml (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: ALWAYS check pyproject.toml for existing dependencies before adding new ones
ALWAYS verify current library versions in pyproject.toml rather than assuming versions
Prefer reading supported Python versions and tool configs from pyproject.toml

Files:

pyproject.toml

🔇 Additional comments (6)

docs/deployment_guide.md (1)

679-679: LGTM - Documentation version updated consistently.

The llama-stack version bump to 0.2.23 aligns with the changes in pyproject.toml and other documentation files.

docs/getting_started.md (1)

27-27: LGTM - Installation documentation updated consistently.

The version bump to llama-stack 0.2.23 in the installation instructions matches the dependency changes in pyproject.toml.

examples/pyproject.llamastack.toml (1)

7-7: LGTM - Example configuration updated consistently.

The llama-stack version in the example configuration correctly matches the version bump to 0.2.23 across the project.

README.md (1)

248-248: LGTM - README example updated consistently.

The llama-stack version in the README example configuration correctly reflects the version bump to 0.2.23.

pyproject.toml (1)

31-32: Versions verified; bump approved
Both llama-stack==0.2.23 and llama-stack-client==0.2.23 exist on PyPI and are not affected by any known security advisories (the sole advisory targeted <0.2.20, patched in 0.2.20).

docker-compose.yaml (1)

10-10: Confirm image config mount path and run.yaml compatibility
Volume mount updated to /opt/app-root/run.yaml. Please verify that the llama-stack image (quay.io/opendatahub/llama-stack:rhoai-v2.25-latest) actually loads its configuration from /opt/app-root/run.yaml (e.g. via ENV or ENTRYPOINT), and confirm that the run.yaml in the repo root adheres to the schema expected by llama-stack v0.2.23 to prevent runtime failures.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

run.yaml (2)
15-15: New files API is fine; ensure client/Docs reflect it.

Addition looks correct. Please add a short note in docs and any client code paths that this API is now available, and include minimal usage instructions.

109-116: LocalFS provider: pathing, concurrency, and portability nits.

Use an env-driven path so containers and RHOAI mounts align; e.g., storage_dir: ${env.LLAMA_FILES_DIR:-.llama/distributions/ollama/files}.

SQLite under heavy concurrent writes can lock on NFS/overlay volumes; if you expect write load, consider WAL/journal config or a server DB later.

Verify docker-compose/k8s volumes create these dirs with correct UID/GID.

Proposed tweak:
-        storage_dir: .llama/distributions/ollama/files
+        storage_dir: ${env.LLAMA_FILES_DIR:-.llama/distributions/ollama/files}
-          db_path: .llama/distributions/ollama/files_metadata.db
+          db_path: ${env.LLAMA_FILES_META_DB:-.llama/distributions/ollama/files_metadata.db}
docs/getting_started.md (2)
27-27: Version bump LGTM; add FAISS note and align with new Files API.

Good to pin llama-stack==0.2.23. Given PR title mentions FAISS blocking, add a short “If FAISS not available” note with a command to skip vector_io deps or feature-gate that path.

Consider adding the new files API/provider example to match run.yaml so readers don’t get mismatched configs.

64-64: Stale example output.

The sample shows lightspeed-stack==0.2.0. Replace with a placeholder or the current version to avoid confusion.

Suggested change:
-     ~ lightspeed-stack==0.2.0 (from file:///tmp/ramdisk/lightspeed-stack)
+     + lightspeed-stack==<current-version> (from file:///…/lightspeed-stack)
examples/pyproject.llamastack.toml (1)

7-7: Dependency bump LGTM; align Python version with docs.

Keep 0.2.23. Docs say Python 3.12 or 3.13, but this example pins requires-python = "==3.12.*". Either relax to ">=3.12,<3.14" or update docs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 658ab1e and b5cad94.

📒 Files selected for processing (7)

README.md (1 hunks)
docker-compose.yaml (2 hunks)
docs/deployment_guide.md (1 hunks)
docs/getting_started.md (1 hunks)
examples/pyproject.llamastack.toml (1 hunks)
pyproject.toml (1 hunks)
run.yaml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

pyproject.toml
docs/deployment_guide.md
README.md
docker-compose.yaml

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5cad94 and 48b902f.

📒 Files selected for processing (7)

README.md (1 hunks)
docker-compose.yaml (2 hunks)
docs/deployment_guide.md (1 hunks)
docs/getting_started.md (1 hunks)
examples/pyproject.llamastack.toml (1 hunks)
pyproject.toml (1 hunks)
run.yaml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

docker-compose.yaml
docs/getting_started.md
docs/deployment_guide.md
pyproject.toml

🔇 Additional comments (1)

run.yaml (1)

15-15: Files API wiring looks solid.

Adding the files API and backing it with the inline localfs provider (storage directory plus SQLite metadata store) mirrors the upstream starter distribution template, so this should just work. Nice alignment with the reference config. (git.kvant.cloud)

Also applies to: 109-116

examples/pyproject.llamastack.toml

coderabbitai · 2025-10-01T14:37:06Z

README.md

 authors = []
 dependencies = [
-    "llama-stack==0.2.21",
+    "llama-stack==0.2.23",


⚠️ Potential issue | 🔴 Critical

Documentation now points to an unavailable release.

The README tells users to install llama-stack==0.2.23, but PyPI’s most recent release is still 0.2.22 (published September 16, 2025). Following the doc will fail. Please keep the doc aligned with the latest released version until 0.2.23 is actually published. (pypi.org)

🤖 Prompt for AI Agents

README.md around line 248: the README pins "llama-stack==0.2.23" which is not yet available on PyPI; update the documented requirement to the latest released version "llama-stack==0.2.22" (or change to a permissive range like "llama-stack>=0.2.22,<0.3.0") so installs succeed until 0.2.23 is published, and keep the comment or changelog note to revert to 0.2.23 when it is officially released.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

README.md (1)
248-265: Doc points to unreleased llama-stack 0.2.23; pin to 0.2.22 or a range.

PyPI latest is 0.2.22 (uploaded Sep 16, 2025). Update the example so installs don’t fail until 0.2.23 is published.
-    "llama-stack==0.2.23",
+    "llama-stack==0.2.22",
+# or, to allow forward updates when available:
+#   "llama-stack>=0.2.22,<0.3.0",
Source: PyPI release listing shows 0.2.22 as the most recent. (pypi.org)

🧹 Nitpick comments (3)

Dockerfile.rh-llama-stack (1)
2-2: Pin the base image (avoid floating tag).

rhoai-v2.25-latest is mutable; pin to a version or digest for reproducible builds and safer rollbacks.
-FROM quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
+# Example (replace with the actual digest you validated):
+FROM quay.io/opendatahub/llama-stack@sha256:<digest-for-rhoai-v2.25>
docker-compose.yaml (2)
7-7: Platform forced to amd64; verify need on Apple Silicon.

Hard-coding linux/amd64 forces QEMU emulation on arm64 hosts (slow). If not strictly required (e.g., faiss wheel/BLAS constraints), consider removing or gating via build args.

20-24: Harden healthcheck and extend warmup window.

Use CMD-SHELL with curl|wget fallback and lengthen start_period for model/provider init. If /v1/health is not stable, consider /openapi.json or /v1/models.
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"]
-      interval: 10s
-      timeout: 5s
-      retries: 3
-      start_period: 30s
+    healthcheck:
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || curl -fsS http://localhost:8321/openapi.json >/dev/null || wget -qO- http://localhost:8321/v1/health >/dev/null || wget -qO- http://localhost:8321/openapi.json >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 6
+      start_period: 120s
General references: Llama Stack exposes OpenAI-compatible APIs under /v1 (e.g., /v1/models), and readiness endpoints vary by stack; allowing a longer warmup avoids false negatives. (llamastack.github.io)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b902f and b4db84f.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

Dockerfile.rh-llama-stack (1 hunks)
README.md (1 hunks)
docker-compose.yaml (2 hunks)
docs/deployment_guide.md (1 hunks)
docs/getting_started.md (1 hunks)
examples/pyproject.llamastack.toml (1 hunks)
pyproject.toml (1 hunks)
run.yaml (2 hunks)
test.containerfile (0 hunks)

💤 Files with no reviewable changes (1)

test.containerfile

🚧 Files skipped from review as they are similar to previous changes (5)

examples/pyproject.llamastack.toml
pyproject.toml
run.yaml
docs/getting_started.md
docs/deployment_guide.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-pr

🔇 Additional comments (1)

docker-compose.yaml (1)
12-12: Confirm llama-stack reads /opt/app-root/run.yaml.

Previous examples often referenced /app-root/run.yaml; ensure the base image’s entrypoint uses /opt/app-root/run.yaml or pass an explicit command/arg to point at the mounted path.

If unsure, you can override command to be explicit:
command: ["llama", "stack", "run", "/opt/app-root/run.yaml"]
Relevant docs discuss run.yaml as the server config but don’t mandate a fixed path. (llama-stack.readthedocs.io)

coderabbitai · 2025-10-01T14:48:16Z

test.containerfile

+USER root
+RUN pip install faiss-cpu==1.11.0
+
+# Switch back to the original user
+USER 1001


⚠️ Potential issue | 🟠 Major

faiss-cpu version: prefer 1.12.0 (cp312 wheels available) and add no-cache.

1.11.0 will work, but 1.12.0 is the current release with manylinux cp312 wheels; consider upgrading or at least using a compatible range. Also add --no-cache-dir to shrink layers.

-USER root -RUN pip install faiss-cpu==1.11.0 +USER root +# Smaller, up-to-date; adjust if you must stay on a fixed version +RUN pip install --no-cache-dir 'faiss-cpu==1.12.0'

PyPI shows llama_stack at 0.2.22 (latest as of Sep 16, 2025) and faiss-cpu 1.12.0 wheels for CPython 3.12 uploaded Aug 13, 2025. (pypi.org)

🤖 Prompt for AI Agents

In Dockerfile.rh-llama-stack around lines 5 to 9, the Dockerfile pins faiss-cpu to 1.11.0 and does not use pip's --no-cache-dir; update the pip install to use a modern compatible version (prefer faiss-cpu==1.12.0 or a range like faiss-cpu>=1.12.0,<1.13.0) and add --no-cache-dir to the pip install command to avoid caching wheel files in the image, then keep the USER 1001 switch as-is.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

docker-compose.yaml (1)

20-24: Harden healthcheck and allow longer startup for model load.

Use CMD-SHELL with curl/wget fallback and increase start_period/retries to avoid flapping on cold starts.

-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8321/v1/health"]
-      interval: 10s   # how often to run the check
-      timeout: 5s    # how long to wait before considering it failed
-      retries: 3      # how many times to retry before marking as unhealthy
-      start_period: 15s # time to wait before starting checks
+    healthcheck:
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8321/v1/health || wget -qO- http://localhost:8321/v1/health >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 6
+      start_period: 120s

🧹 Nitpick comments (4)

docker-compose.yaml (4)
7-7: Make platform overrideable to support ARM hosts and CI.

Hardcoding linux/amd64 forces emulation on Apple Silicon and some CI runners. Allow override via env.
-    platform: linux/amd64
+    platform: ${PLATFORM:-linux/amd64}
12-12: Confirm run.yaml path; mount read-only if not mutated.

Verify llama-stack reads /opt/app-root/run.yaml. If the file isn’t written by the container, mount it ro for safety.
-      - ./run.yaml:/opt/app-root/run.yaml:Z
+      - ./run.yaml:/opt/app-root/run.yaml:ro,Z
15-16: Handle API keys via .env/secrets; fail fast if missing.

Empty defaults can hide misconfig. Prefer .env with required expansion or Docker secrets.

Option A (fail fast):
-      - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:-}
-      - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:-}
+      - BRAVE_SEARCH_API_KEY=${BRAVE_SEARCH_API_KEY:?set in .env}
+      - TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY:?set in .env}
Option B: keep defaults but document .env.sample.

43-47: Align healthcheck style for consistency and robustness.

Optional: switch to CMD-SHELL with -fsS and a slightly higher start_period to reduce false negatives at boot.
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8080/liveness"]
-      interval: 10s   # how often to run the check
-      timeout: 5s    # how long to wait before considering it failed
-      retries: 3      # how many times to retry before marking as unhealthy
-      start_period: 5s # time to wait before starting checks
+    healthcheck:
+      test: ["CMD-SHELL", "curl -fsS http://localhost:8080/liveness || wget -qO- http://localhost:8080/liveness >/dev/null"]
+      interval: 10s
+      timeout: 5s
+      retries: 3
+      start_period: 15s

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b60257 and d237ce4.

📒 Files selected for processing (2)

docker-compose.yaml (2 hunks)
test.containerfile (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

test.containerfile

- Add test.containerfile to extend Red Hat image with FAISS - Update docker-compose.yaml to use Red Hat llama-stack image - Configure platform and volume paths for Red Hat image compatibility - Fix Containerfile build issue by temporarily disabling uv installation in final stage

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

test.containerfile (1)
5-9: Address the previous review comment: upgrade faiss-cpu and add --no-cache-dir.

The previous review comment regarding faiss-cpu version and --no-cache-dir flag remains unaddressed. Consider upgrading to faiss-cpu 1.12.0 (which has cp312 wheels available) and adding --no-cache-dir to reduce image size.

Apply this diff:
 USER root
-RUN pip install faiss-cpu==1.11.0 && \
+RUN pip install --no-cache-dir faiss-cpu==1.12.0 && \
     mkdir -p /opt/app-root/.llama/distributions/ollama && \
     chown -R 1001:0 /opt/app-root/.llama && \
     chmod -R g+rwx /opt/app-root/.llama

🧹 Nitpick comments (1)

test.containerfile (1)

1-2: Pin base image to specific digest
Replace the :latest tag in the FROM quay.io/opendatahub/llama-stack:rhoai-v2.25-latest line with @sha256:f8acb6064989d298d5ac83e639e69532378e4a3320191b85ca42f8f421d758cd to ensure reproducible builds.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d237ce4 and 168921a.

📒 Files selected for processing (3)

Containerfile (1 hunks)
docker-compose.yaml (2 hunks)
test.containerfile (1 hunks)

✅ Files skipped from review due to trivial changes (1)

Containerfile

🚧 Files skipped from review as they are similar to previous changes (1)

docker-compose.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-pr

🔇 Additional comments (2)

test.containerfile (2)

11-12: LGTM!

The working directory is correctly set to /opt/app-root, which is consistent with OpenShift conventions.

14-15: LGTM!

Correctly switches back to non-root user (UID 1001) after completing privileged operations, following container security best practices.

radofuchs

LGTM

tisnik

LGTM

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

gallettilance force-pushed the rhoai-lls branch from bcf3056 to 658ab1e Compare September 30, 2025 17:44

gallettilance changed the title ~~(spike) using rhoai lls~~ LCORE-651: using rhoai lls image (blocked by faiss dep) Sep 30, 2025

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

gallettilance force-pushed the rhoai-lls branch from 658ab1e to b5cad94 Compare October 1, 2025 13:58

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

gallettilance force-pushed the rhoai-lls branch from b5cad94 to 48b902f Compare October 1, 2025 14:32

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

gallettilance force-pushed the rhoai-lls branch 2 times, most recently from b79613c to b4db84f Compare October 1, 2025 14:41

gallettilance changed the title ~~LCORE-651: using rhoai lls image (blocked by faiss dep)~~ LCORE-651: using rhoai lls image Oct 1, 2025

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

gallettilance force-pushed the rhoai-lls branch 7 times, most recently from 2516762 to d237ce4 Compare October 7, 2025 14:08

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

gallettilance force-pushed the rhoai-lls branch from d237ce4 to a523095 Compare October 7, 2025 14:13

test

168921a

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

change permissions for the path

be6f10d

radofuchs approved these changes Oct 9, 2025

View reviewed changes

tisnik approved these changes Oct 9, 2025

View reviewed changes

tisnik merged commit fbc571a into lightspeed-core:main Oct 9, 2025
18 of 19 checks passed

coderabbitai bot mentioned this pull request Oct 10, 2025

LCORE-378: Lightspeed core needs to fully support Red Hat AI Inference server LLM provider #652

Merged

18 tasks

coderabbitai bot mentioned this pull request Oct 27, 2025

LCORE-333: Lightspeed core needs to fully support RHEL AI LLM provider #714

Merged

18 tasks

coderabbitai bot mentioned this pull request Nov 3, 2025

LCORE-334: Lightspeed core needs to fully support RHOAI LLM provider (prow) #747

Merged

18 tasks

LCORE-651: using rhoai lls image #517

LCORE-651: using rhoai lls image #517

Uh oh!

Conversation

gallettilance commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gallettilance commented Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

radofuchs left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gallettilance commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading