Skip to content

Conversation

@are-ces
Copy link
Contributor

@are-ces are-ces commented Oct 31, 2025

vLLM served in Openshift, llama-stack (2.25 RHOAI image) and lightspeed-stack (dev-latest) served in Openshift as independent containers. Tests run on a fourth pod.

vLLM runs in CPU mode since GPUs are not available yet for our prow env.

Description

The main script pipeline.sh runs in prow and does the following:

  1. Mirror the vllm-cpu image from the official repo (aws) to quay.io, this is needed because the prow proxy is blocking the AWS url public.ecr.aws)
  2. Deploy the vLLM pod
  3. Deploy llama-stack and lightspeed-stack in two separate pods
  4. Deploy the test-pod which will run the e2e-tests
  5. If the test fails, the prow job fails.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue # LCORE-334
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

Please refer to the prow logs of the latest run.

Summary by CodeRabbit

  • Documentation

    • Added RHOAI (vLLM) to provider docs and README compatibility table with meta-llama/Llama-3.2-1B-Instruct.
  • Tests

    • Added comprehensive end-to-end test scaffolding and orchestration for RHOAI (vLLM), including bootstrap, deploy, readiness checks, and test-run flows.
  • Chores

    • Added Kubernetes manifests, deployment/operator manifests, and pipeline scripts to automate deployment, exposure, and e2e execution for the new provider.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

Walkthrough

Adds RHOAI (vLLM) provider entries and a complete e2e test harness: documentation, Kubernetes manifests (operators, ServingRuntime, InferenceService, pods), LCS/llama-stack configs, and orchestration/bootstrap/deploy/test scripts to install vLLM, expose services, and run end-to-end tests.

Changes

Cohort / File(s) Summary
Documentation
README.md, docs/providers.md
Added RHOAI (vLLM) provider rows and a model entry meta-llama/Llama-3.2-1B-Instruct with provider_type remote::vllm.
Configs
tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml, tests/e2e-prow/rhoai/configs/run.yaml
New Lightspeed/Core Service and comprehensive run.yaml defining providers, models (vllm), stores, auth, and user-data collection settings.
Operator Manifests
tests/e2e-prow/rhoai/manifests/operators/operators.yaml, tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml, tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml
Added Operator Subscriptions, an OperatorGroup, and a DataScienceCluster CR for platform/operator setup.
Service Pod Manifests
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml, tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml
Pod manifests for lightspeed-stack and llama-stack with secret-sourced env vars and ConfigMap mounts.
vLLM KServe Manifests
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml, tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml
Added KServe ServingRuntime and InferenceService for vLLM runtime (pytorch modelFormat, runtime vllm, ports, secrets, volumes).
Test Pod Manifest
tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml
Pod manifest to run e2e tests; mounts script ConfigMap and reads LCS/LLAMA IPs from secrets.
Pipeline / Orchestration Scripts
tests/e2e-prow/rhoai/pipeline.sh, tests/e2e-prow/rhoai/pipeline-vllm.sh, tests/e2e-prow/rhoai/pipeline-services.sh, tests/e2e-prow/rhoai/pipeline-test-pod.sh
Orchestration scripts to provision namespace/secrets/configs, deploy vLLM/KServe, expose llama-stack, create LLAMA_IP secret, deploy Lightspeed, and run the test pod with readiness checks.
Bootstrap / Deploy Utilities
tests/e2e-prow/rhoai/scripts/bootstrap.sh, tests/e2e-prow/rhoai/scripts/deploy-vllm.sh, tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh
Scripts to install/wait for operators, apply OperatorGroup/DS cluster, wait for KServe CRDs/controller, apply vLLM manifests, and capture pod/KSVC metadata.
Test Runner
tests/e2e-prow/rhoai/run-tests.sh
Script to validate LCS connectivity, clone lightspeed-stack, install test deps, and run the e2e test suite.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Prow
    participant Pipeline as pipeline.sh
    participant Bootstrap as bootstrap.sh
    participant DeployVLLM as deploy-vllm.sh
    participant Services as pipeline-services.sh
    participant K8s
    participant Llama as llama-stack
    participant LCS as lightspeed-stack
    participant TestPod as test-pod

    Prow->>Pipeline: start pipeline
    Pipeline->>K8s: create namespace, secrets, configmaps
    Pipeline->>Bootstrap: apply operators & wait for CSVs
    Bootstrap->>K8s: ensure operator CSVs ready
    Pipeline->>DeployVLLM: wait for KServe CRDs/controller, apply vLLM manifests
    DeployVLLM->>K8s: apply ServingRuntime & InferenceService
    Pipeline->>Services: deploy llama-stack pod
    Services->>K8s: wait Ready → expose service → create LLAMA_IP secret
    Services->>K8s: deploy lightspeed-stack with LLAMA_IP
    Pipeline->>K8s: deploy test-pod (spin-up)
    TestPod->>LCS: hit /v1/models and run make test-e2e
    TestPod-->>Prow: return exit status
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Files needing extra attention:
    • KServe-related manifests and API versions (ServingRuntime, InferenceService).
    • Cross-file secret/ConfigMap/env var references (E2E_LLAMA_HOSTNAME, VLLM_API_KEY, HF token).
    • Script wait loops, timeouts, error handling and idempotency (bootstrap.sh, deploy-vllm.sh, pipeline.sh).

Possibly related PRs

Suggested reviewers

  • tisnik
  • radofuchs

Poem

🐇 I hopped through YAML and scripts so bright,
Secrets and services set just right,
vLLM wakes and Llama greets the call,
Lightspeed queues tests — logs sing through the hall,
Rabbit claps, the pipeline passed them all.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding RHOAI LLM provider support with a prow pipeline for end-to-end testing in OpenShift.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95c7a6b and 677ba71.

📒 Files selected for processing (20)
  • README.md (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/configs/run.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-services.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/run-tests.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • docs/providers.md
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml
  • README.md
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh
  • tests/e2e-prow/rhoai/pipeline-services.sh
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/pipeline.sh
  • tests/e2e-prow/rhoai/run-tests.sh
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e-prow/rhoai/run-tests.sh
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml
🪛 Checkov (3.2.334)
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml

[medium] 1-25: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-25: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml

[medium] 1-30: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-30: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (3)
tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1)

1-25: Configuration looks good.

The LCS configuration is well-structured and properly references the llama-stack hostname via environment variable. No concerns.

tests/e2e-prow/rhoai/pipeline-vllm.sh (1)

1-7: Script structure is solid.

The orchestration is clean and properly chains the three helper scripts with correct argument passing. The PIPELINE_DIR resolution is robust.

tests/e2e-prow/rhoai/configs/run.yaml (1)

62-69: Configuration structure looks appropriate for minimal viable llama-stack setup.

The vllm inference provider references environment variables for URL and API key (lines 66-67), which will be injected at runtime by the test bootstrap/deploy scripts. The tls_verify: false setting is acceptable for a test environment.

Verify that the bootstrap and deployment scripts properly set KSVC_URL and VLLM_API_KEY before launching the llama-stack pod, otherwise the provider connection will fail.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (12)
tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1)

1-30: Add SecurityContext to restrict privileges in test pod.

The container lacks a SecurityContext and will run as root. Checkov flags this as a security concern. Even in test environments, add a minimal SecurityContext to follow Kubernetes security best practices.

Apply this diff to add basic security constraints:

   containers:
   - name: test-container
+    securityContext:
+      allowPrivilegeEscalation: false
+      runAsNonRoot: true
+      runAsUser: 1000
+      readOnlyRootFilesystem: false
     env:
       - name: LCS_IP

Alternatively, if the test script requires root or elevated privileges, document this in a comment explaining the rationale.

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1)

1-34: Add SecurityContext and consider limiting emptyDir size.

Two improvements for production readiness (though acceptable for test environment):

  1. No SecurityContext is defined; container runs as root by default.
  2. The app-root emptyDir has no size limit and could consume unbounded disk space.

Apply this diff:

     - name: llama-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+        runAsNonRoot: true
+        runAsUser: 1000
+        readOnlyRootFilesystem: false
       env:

And add a size limit to the emptyDir:

   volumes:
     - name: app-root
-      emptyDir: {}
+      emptyDir:
+        sizeLimit: 1Gi

If the test workflow requires root or large storage, document the rationale in a comment.

tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1)

1-35: Normalize YAML quoting style for consistency.

Lines 8 and 20 quote the channel value ("stable"), while line 32 does not (stable). Both are functionally equivalent, but normalizing to consistent style improves maintainability.

Apply this diff to unify the style:

 spec:
   channel: "stable"
   name: "servicemeshoperator"

or:

 spec:
-  channel: "stable"
+  channel: stable

Either approach is fine; pick the one matching your project conventions (typically unquoted for simple strings).

tests/e2e-prow/rhoai/pipeline-test-pod.sh (1)

1-5: Pipeline script is straightforward and correct.

The script correctly computes BASE_DIR and applies the test pod manifest. For robustness, consider adding set -e at line 2 to exit on oc apply failure, but the current implementation is acceptable for this simple orchestration step.

Optionally add error handling:

 #!/bin/bash
+set -e
 
 BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1)

16-17: Consider using environment variable for api_key placeholder.

The hardcoded api_key value "xyzzy" is a placeholder. For better security posture and test flexibility, use an environment variable pattern like other values:

-  api_key: xyzzy
+  api_key: ${env.LLAMA_API_KEY:-xyzzy}

This allows overriding the placeholder via environment variable in production while maintaining a fallback for testing.

tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1)

1-3: Add strict error handling and argument validation.

The script lacks set -euo pipefail and doesn't validate the BASE_DIR argument, which could lead to applying manifests from incorrect paths.

 #!/bin/bash
+
+set -euo pipefail
+
+if [[ -z "${1:-}" ]]; then
+    echo "❌ BASE_DIR argument required"
+    exit 1
+fi
 
-BASE_DIR="$1"
+readonly BASE_DIR="$1"
tests/e2e-prow/rhoai/scripts/bootstrap.sh (1)

1-10: Add BASE_DIR validation for robustness.

The script uses BASE_DIR without validating it's provided, which could lead to manifests being applied from incorrect paths.

 #!/bin/bash
 
 set -euo pipefail
 
+if [[ -z "${1:-}" ]]; then
+    echo "❌ BASE_DIR argument required"
+    exit 1
+fi
-BASE_DIR="$1"
+readonly BASE_DIR="$1"
tests/e2e-prow/rhoai/pipeline.sh (4)

5-5: Consider parameterizing the hardcoded model name.

Line 5 hardcodes MODEL_NAME="meta-llama/Llama-3.2-1B-Instruct". For flexibility in future testing with different models, consider accepting this as an environment variable with a default fallback:

MODEL_NAME="${LLAMA_MODEL_ID:-meta-llama/Llama-3.2-1B-Instruct}"

This allows prow jobs to override the model without modifying the script.


100-100: Strengthen response validation to tolerate formatting variations.

Line 100 checks for exact string '"object":"text_completion"' in the response body. This is fragile if the JSON response format or field order changes. Consider a more robust check using jq:

if [[ "$http_code" == "200" ]] && echo "$body" | jq -e '.object == "text_completion"' >/dev/null 2>&1; then
  echo "✅ API test passed."
  break
else
  echo "❌ API test failed (HTTP $http_code)"
  echo "$body" | jq . 2>/dev/null || echo "$body"
fi

This extracts and validates the field semantically rather than relying on string position.

Also applies to: 105-105


200-210: Add error handling for missing pod termination status.

Line 200 extracts the exit code using a jsonpath that assumes .status.containerStatuses[0].state.terminated.exitCode exists. If the pod terminates abnormally (e.g., OOMKilled, evicted) or the path doesn't exist, $TEST_EXIT_CODE will be empty or null, and the exit condition at line 204 will default to 2 (due to ${TEST_EXIT_CODE:-2} fallback). While this is safe (fails closed), add explicit logging to surface the actual termination reason:

TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}' 2>/dev/null)
TERMINATION_REASON=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.reason}' 2>/dev/null)

if [[ -z "$TEST_EXIT_CODE" ]]; then
  echo "❌ Unable to extract exit code. Termination reason: $TERMINATION_REASON"
  exit 1
fi

This provides visibility into unexpected terminations beyond exit code 0/non-zero.


12-15: Document or verify hardcoded credential paths and image mirror URLs.

Lines 12-15 read credentials from hardcoded prow mount paths (/var/run/...), and lines 26-29 mirror a specific AWS image to a specific quay.io organization path (quay.io/rh-ee-cpompeia/vllm-cpu:latest). These are tightly coupled to the prow environment and the Red Hat organization structure. Add inline comments documenting:

  1. Why these specific mount paths are used (prow convention)
  2. Why the quay.io target is organization-specific
  3. Whether these paths/URLs are managed by a central prow configuration or should be externalized

Example:

# Prow mounts secrets at these standardized paths (managed by prow infra)
export HUGGING_FACE_HUB_TOKEN=$(cat /var/run/huggingface/hf-token-ces-lcore-test || true)
...
# Mirror to quay.io repo managed by RHOAI release engineering
oc image mirror \
  public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest \
  quay.io/rh-ee-cpompeia/vllm-cpu:latest

This improves maintainability for future modifications.

Also applies to: 26-29

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1)

14-26: Consider parameterizing vLLM container arguments for flexibility.

Lines 14-26 hardcode critical vLLM launch parameters (model, tool parser, max model length). For future test variations, consider using environment variables or an init container to populate these dynamically. However, if this is intentionally a static CPU benchmark configuration for prow, add a comment clarifying the rationale:

# Static configuration optimized for CPU-only prow environment
# To test alternative models or parameters, create a new ServingRuntime manifest
containers:
  - args:
      - --model
      - ${VLLM_MODEL_ID:-meta-llama/Llama-3.2-1B-Instruct}  # Override via env var
      ...

Alternatively, if environment variables are not supported in vLLM args, document this limitation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d280c9 and 0f27d5a.

📒 Files selected for processing (20)
  • README.md (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/configs/run.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-services.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/run-tests.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/run-tests.sh
  • tests/e2e-prow/rhoai/pipeline.sh
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml
🪛 Checkov (3.2.334)
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml

[medium] 1-25: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-25: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml

[medium] 1-30: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-30: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

🔇 Additional comments (14)
tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml (1)

1-6: OperatorGroup manifest looks correct.

The global-operators OperatorGroup with empty spec correctly scopes the subscriptions in openshift-operators namespace to cluster-wide availability, which is appropriate for the operator setup.

tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1)

1-17: DataScienceCluster configuration is appropriate for e2e test.

Enabling kserve and service mesh while removing workbenches, dashboard, and pipelines is a focused setup for testing vLLM inference services. The configuration aligns with the operator subscriptions defined in operators.yaml.

README.md (1)

127-127: LLM compatibility table entry looks good.

The RHOAI (vLLM) entry is correctly formatted, follows the table conventions, and includes a valid link to the test configuration example.

docs/providers.md (1)

60-65: LGTM!

The documentation entry for the RHOAI (vllm) provider is consistent with the existing table format and accurately reflects the provider details.

tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml (1)

1-13: LGTM!

The KServe InferenceService manifest is correctly structured and appropriately references the vllm runtime and model storage configuration.

tests/e2e-prow/rhoai/scripts/bootstrap.sh (1)

1-47: Overall script quality is solid.

The bootstrap script demonstrates good practices with set -euo pipefail, clear progress logging, a well-structured wait_for_operator helper, and appropriate timeout values. The operator sequencing and readiness checks are well thought out for a complex orchestration scenario. Minor: add BASE_DIR validation (see comment above).

tests/e2e-prow/rhoai/pipeline-vllm.sh (1)

1-7: Add error handling to prevent cascading failures.

The script chains sub-scripts without checking exit codes. If any script fails, the pipeline continues silently instead of stopping. This can lead to partial/corrupted state being passed to the test phase.

Apply this diff to add strict error handling:

 #!/bin/bash
+
+set -euo pipefail

 PIPELINE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 
 "$PIPELINE_DIR/scripts/bootstrap.sh" "$PIPELINE_DIR"
 "$PIPELINE_DIR/scripts/deploy-vllm.sh" "$PIPELINE_DIR"
 "$PIPELINE_DIR/scripts/get-vllm-pod-info.sh"

Note: get-vllm-pod-info.sh expects an optional ISVC_NAME argument (defaults to "vllm-model"), not PIPELINE_DIR, so the current call without arguments is correct.

Likely an incorrect or invalid review comment.

tests/e2e-prow/rhoai/configs/run.yaml (1)

62-69: Verify llama-stack supports ${env.*} environment variable interpolation syntax.

The inference provider configuration uses ${env.KSVC_URL} and ${env.VLLM_API_KEY} for runtime variable substitution. Ensure that the llama-stack runtime (v2.25 per PR objectives) natively supports this interpolation syntax before container startup, or if pre-substitution is required, verify it's handled by the deployment manifests.

tests/e2e-prow/rhoai/pipeline-services.sh (2)

5-5: Verify manifest paths and pod naming conventions are consistent.

Line 5 applies manifests/lightspeed/llama-stack.yaml while line 25 applies manifests/lightspeed/lightspeed-stack.yaml. Confirm both files exist at the expected paths and that the pod names (llama-stack-service, lightspeed-stack-service) match their respective manifest deployments, especially if the subdirectory structure differs from the manifest locations.

Also applies to: 25-25


7-8: Verify pod/service naming and port alignment.

The script waits for pod llama-stack-service and then exposes it as service llama-stack-service-svc on port 8321. Confirm that the llama-stack manifest defines a pod (not a Deployment/StatefulSet) named llama-stack-service, and that port 8321 is the correct target port for the llama-stack service.

Also applies to: 13-17

tests/e2e-prow/rhoai/pipeline.sh (2)

128-128: Verify run-tests.sh exists before creating configmap.

Line 128 creates a configmap from configs/run-tests.sh. Ensure this file exists in the repository at the path relative to BASE_DIR (likely $BASE_DIR/configs/run-tests.sh or tests/e2e-prow/rhoai/configs/run-tests.sh). If the file path is incorrect or the file is not committed, the configmap creation will fail silently or create an empty configmap.


168-168: Verify pipeline-test-pod.sh and ensure it deploys pod named 'test-pod'.

Line 168 calls ./pipeline-test-pod.sh, which is not provided in the review. Confirm that this script:

  1. Exists and is executable
  2. Deploys a pod named test-pod in the $NAMESPACE (e2e-rhoai-dsc)
  3. Does not fail silently on errors

The pipeline depends on test-pod existing at line 174 (wait condition) and line 181 (phase check).

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (2)

29-29: Verify vLLM image is accessible and correctly mirrored.

Line 29 references quay.io/rh-ee-cpompeia/vllm-cpu:latest. This image should be mirrored by the pipeline.sh script (lines 26-29 in that file) from public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest. Confirm:

  1. The mirror source URL is correct and publicly accessible
  2. The target image path matches exactly (including registry, namespace, and tag)
  3. The image is available before this manifest is applied (verify dependency order in pipeline.sh)

If the mirror fails or the image is not pushed, pod startup will fail with ImagePullBackOff.


11-12: Verify model loading timeout is adequate for the model size and environment.

The modelLoadingTimeoutMillis: 90000 (90 seconds) may be insufficient for downloading and initializing a 1B parameter model on CPU in prow, especially if HuggingFace API rate limits or network latency are involved. Consider:

  1. Running a test deployment to measure actual load time
  2. Increasing timeout to 180-300 seconds as a conservative margin
  3. Adding observability (logs, metrics) to track model loading duration for future troubleshooting

If the timeout is consistently exceeded, the pod will fail to become Ready, blocking the entire test pipeline.

Comment on lines 1 to 25
apiVersion: v1
kind: Pod
metadata:
name: lightspeed-stack-service
namespace: e2e-rhoai-dsc
spec:
containers:
- name: lightspeed-stack-container
env:
- name: LLAMA_IP
valueFrom:
secretKeyRef:
name: llama-stack-ip-secret
key: key
image: quay.io/lightspeed-core/lightspeed-stack:dev-latest
ports:
- containerPort: 8080
volumeMounts:
- name: config
mountPath: /app-root/lightspeed-stack.yaml
subPath: lightspeed-stack.yaml
volumes:
- name: config
configMap:
name: lightspeed-stack-config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add security context and resource limits to meet Kubernetes security best practices.

The Pod lacks security hardening and resource management:

  1. No securityContext (allows privilege escalation and root execution)
  2. No resource requests/limits (affects scheduling and prevents runaway consumption)

Apply this diff to address security and resource management:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
   containers:
     - name: lightspeed-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+      resources:
+        requests:
+          memory: "512Mi"
+          cpu: "250m"
+        limits:
+          memory: "1Gi"
+          cpu: "500m"
       env:
         - name: LLAMA_IP

Adjust resource values based on actual performance requirements observed during testing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
apiVersion: v1
kind: Pod
metadata:
name: lightspeed-stack-service
namespace: e2e-rhoai-dsc
spec:
containers:
- name: lightspeed-stack-container
env:
- name: LLAMA_IP
valueFrom:
secretKeyRef:
name: llama-stack-ip-secret
key: key
image: quay.io/lightspeed-core/lightspeed-stack:dev-latest
ports:
- containerPort: 8080
volumeMounts:
- name: config
mountPath: /app-root/lightspeed-stack.yaml
subPath: lightspeed-stack.yaml
volumes:
- name: config
configMap:
name: lightspeed-stack-config
apiVersion: v1
kind: Pod
metadata:
name: lightspeed-stack-service
namespace: e2e-rhoai-dsc
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: lightspeed-stack-container
securityContext:
allowPrivilegeEscalation: false
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
env:
- name: LLAMA_IP
valueFrom:
secretKeyRef:
name: llama-stack-ip-secret
key: key
image: quay.io/lightspeed-core/lightspeed-stack:dev-latest
ports:
- containerPort: 8080
volumeMounts:
- name: config
mountPath: /app-root/lightspeed-stack.yaml
subPath: lightspeed-stack.yaml
volumes:
- name: config
configMap:
name: lightspeed-stack-config
🧰 Tools
🪛 Checkov (3.2.334)

[medium] 1-25: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-25: Minimize the admission of root containers

(CKV_K8S_23)

- --max-model-len
- "2048"
image: quay.io/rh-ee-cpompeia/vllm-cpu:latest
name: kserve-container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add resource requests and limits to prevent scheduling issues and runaway resource consumption.

The manifest does not define resources.requests or resources.limits. On a CPU-only prow node, vLLM can consume unbounded CPU and memory, potentially evicting other pods or causing the entire node to become unschedulable. Add conservative limits:

containers:
  - name: kserve-container
    resources:
      requests:
        cpu: "2"
        memory: "4Gi"
      limits:
        cpu: "4"
        memory: "8Gi"

Adjust these values based on the 1B model's actual memory footprint (likely 2-4 GB) and CPU availability in prow.

🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml around line 30, the
container "kserve-container" lacks resources.requests and resources.limits which
can cause unbounded CPU/memory usage; add a resources block under that container
with conservative requests and limits (e.g., requests: cpu "2", memory "4Gi";
limits: cpu "4", memory "8Gi") and adjust values as needed for the 1B model and
prow node capacity to prevent scheduling and eviction issues.

#========================================
# 6. WAIT FOR POD & TEST API
#========================================
source pod.env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add error handling for pod.env sourcing and validation of $KSVC_URL.

Line 74 sources pod.env without error checking. If the file is missing or malformed, $POD_NAME and $KSVC_URL will be empty or unset, causing silent failures in the wait command (line 75) and curl calls (line 90). Add validation after sourcing:

source pod.env || { echo "❌ Failed to source pod.env"; exit 1; }
[[ -n "$POD_NAME" ]] && [[ -n "$KSVC_URL" ]] || { echo "❌ Missing POD_NAME or KSVC_URL"; exit 1; }

Alternatively, verify that pipeline-vllm.sh unconditionally exports these variables to a file in a way that guarantees success.

Also applies to: 90-90

🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/pipeline.sh around lines 74 and 90, the script currently
sources pod.env without checking for errors and then uses $POD_NAME and
$KSVC_URL later, which can lead to silent failures; after sourcing pod.env,
ensure the source succeeded and validate that both POD_NAME and KSVC_URL are
non-empty, printing an error and exiting non-zero if sourcing fails or either
variable is missing; alternatively, update pipeline-vllm.sh to reliably write
and export these variables to pod.env (or fail loudly) so sourcing cannot
silently produce unset values.

Comment on lines 1 to 17
git clone https://github.com/lightspeed-core/lightspeed-stack.git
cd lightspeed-stack

echo "pod started"
echo $LCS_IP

curl -f http://$LCS_IP:8080/v1/models || {
echo "❌ Basic connectivity failed - showing logs before running full tests"
exit 1
}

echo "Installing test dependencies..."
pip install uv
uv sync

export E2E_LSC_HOSTNAME=$LCS_IP
export E2E_LLAMA_HOSTNAME=$LLAMA_IP

echo "Running comprehensive e2e test suite..."
make test-e2e No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add environment variable validation and explicit error handling.

The script assumes LCS_IP and LLAMA_IP are set but doesn't validate them. If missing, errors are cryptic. Also, make test-e2e failures may not propagate correctly.

Apply this diff to validate inputs and ensure proper error propagation:

 git clone https://github.com/lightspeed-core/lightspeed-stack.git
 cd lightspeed-stack
+
+# Validate required environment variables
+if [[ -z "${LCS_IP:-}" ]] || [[ -z "${LLAMA_IP:-}" ]]; then
+    echo "❌ Required environment variables not set: LCS_IP=$LCS_IP, LLAMA_IP=$LLAMA_IP"
+    exit 1
+fi
 
 echo "pod started"
 echo $LCS_IP
@@ -12,7 +20,7 @@ echo $LCS_IP
 echo "Installing test dependencies..."
 pip install uv
-uv sync
+uv sync || { echo "❌ Failed to sync dependencies"; exit 1; }
 
 export E2E_LSC_HOSTNAME=$LCS_IP
 export E2E_LLAMA_HOSTNAME=$LLAMA_IP
 
 echo "Running comprehensive e2e test suite..."
-make test-e2e
+make test-e2e || { echo "❌ E2E tests failed"; exit 1; }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
git clone https://github.com/lightspeed-core/lightspeed-stack.git
cd lightspeed-stack
echo "pod started"
echo $LCS_IP
curl -f http://$LCS_IP:8080/v1/models || {
echo "❌ Basic connectivity failed - showing logs before running full tests"
exit 1
}
echo "Installing test dependencies..."
pip install uv
uv sync
export E2E_LSC_HOSTNAME=$LCS_IP
export E2E_LLAMA_HOSTNAME=$LLAMA_IP
echo "Running comprehensive e2e test suite..."
make test-e2e
git clone https://github.com/lightspeed-core/lightspeed-stack.git
cd lightspeed-stack
# Validate required environment variables
if [[ -z "${LCS_IP:-}" ]] || [[ -z "${LLAMA_IP:-}" ]]; then
echo "❌ Required environment variables not set: LCS_IP=$LCS_IP, LLAMA_IP=$LLAMA_IP"
exit 1
fi
echo "pod started"
echo $LCS_IP
curl -f http://$LCS_IP:8080/v1/models || {
echo "❌ Basic connectivity failed - showing logs before running full tests"
exit 1
}
echo "Installing test dependencies..."
pip install uv
uv sync || { echo "❌ Failed to sync dependencies"; exit 1; }
export E2E_LSC_HOSTNAME=$LCS_IP
export E2E_LLAMA_HOSTNAME=$LLAMA_IP
echo "Running comprehensive e2e test suite..."
make test-e2e || { echo "❌ E2E tests failed"; exit 1; }
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/run-tests.sh around lines 1-20, add explicit environment
validation and stricter error handling: enable safe shell options (set -euo
pipefail) and a trap to print a helpful message on failure; check that LCS_IP
and LLAMA_IP are defined and non-empty and exit with descriptive error codes if
they are missing; after the curl health check, capture and print relevant logs
on failure before exiting; run make test-e2e and explicitly exit with the make
command's status so failures propagate to the caller.

Comment on lines +32 to +34
oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml"

oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml" No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add error handling to manifest application commands.

The oc apply commands at the end don't check for failures. If manifests are malformed or paths are wrong, the script succeeds silently.

 oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml"
+oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml" || {
+    echo "❌ Failed to apply vLLM manifests"
+    exit 1
+}
-oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml"
oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml"
oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml"
oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml" || {
echo "❌ Failed to apply vLLM manifests"
exit 1
}
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/scripts/deploy-vllm.sh around lines 32 to 34, the two oc
apply commands run without checking for failures; update the script to handle
errors by verifying each command's exit status and exiting the script on
failure. After each oc apply -f "$BASE_DIR/..." command, add a check like: if
the command fails (non‑zero exit code) print a descriptive error to stderr
including the manifest path and exit with a non‑zero status (or use set -e at
the top and/or append || { echo "failed to apply <file>" >&2; exit 1; } to each
command) so malformed manifests or wrong paths cause the script to fail fast.

Comment on lines +18 to +40
until [ -n "$POD_NAME" ] || [ $ELAPSED -ge $TIMEOUT ]; do
POD_NAME=$(oc get pods -n "$NAMESPACE" \
-l "serving.kserve.io/inferenceservice=$ISVC_NAME" \
-o jsonpath="{.items[?(@.status.phase=='Running')].metadata.name}" 2>/dev/null)
echo "Waiting for pod $POD_NAME in namespace $NAMESPACE"

if [ -z "$POD_NAME" ]; then
echo " -> Pod not running yet, waiting $INTERVAL seconds..."
sleep $INTERVAL
ELAPSED=$((ELAPSED + INTERVAL))
fi
done

oc describe pod $POD_NAME -n $NAMESPACE || true
oc logs $POD_NAME -n $NAMESPACE || true

POD_NAME=$(oc get pods -n $NAMESPACE -o jsonpath='{.items[0].metadata.name}')

if [ -z "$POD_NAME" ]; then
echo " -> Timeout reached after $TIMEOUT seconds. Pod is not running."
else
echo " -> Pod is running: $POD_NAME"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: POD_NAME is overwritten after targeted lookup.

The until loop (lines 18–29) successfully locates the pod matching the InferenceService label. However, line 34 immediately overwrites POD_NAME with the first pod in the namespace, defeating the targeted lookup and potentially operating on the wrong pod.

Apply this diff to remove the erroneous line 34:

 oc describe pod $POD_NAME -n $NAMESPACE || true
 oc logs $POD_NAME -n $NAMESPACE || true
 
-POD_NAME=$(oc get pods -n $NAMESPACE -o jsonpath='{.items[0].metadata.name}')
-
 if [ -z "$POD_NAME" ]; then
   echo "  -> Timeout reached after $TIMEOUT seconds. Pod is not running."
 else

This preserves the POD_NAME found by the targeted loop lookup.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
until [ -n "$POD_NAME" ] || [ $ELAPSED -ge $TIMEOUT ]; do
POD_NAME=$(oc get pods -n "$NAMESPACE" \
-l "serving.kserve.io/inferenceservice=$ISVC_NAME" \
-o jsonpath="{.items[?(@.status.phase=='Running')].metadata.name}" 2>/dev/null)
echo "Waiting for pod $POD_NAME in namespace $NAMESPACE"
if [ -z "$POD_NAME" ]; then
echo " -> Pod not running yet, waiting $INTERVAL seconds..."
sleep $INTERVAL
ELAPSED=$((ELAPSED + INTERVAL))
fi
done
oc describe pod $POD_NAME -n $NAMESPACE || true
oc logs $POD_NAME -n $NAMESPACE || true
POD_NAME=$(oc get pods -n $NAMESPACE -o jsonpath='{.items[0].metadata.name}')
if [ -z "$POD_NAME" ]; then
echo " -> Timeout reached after $TIMEOUT seconds. Pod is not running."
else
echo " -> Pod is running: $POD_NAME"
fi
until [ -n "$POD_NAME" ] || [ $ELAPSED -ge $TIMEOUT ]; do
POD_NAME=$(oc get pods -n "$NAMESPACE" \
-l "serving.kserve.io/inferenceservice=$ISVC_NAME" \
-o jsonpath="{.items[?(@.status.phase=='Running')].metadata.name}" 2>/dev/null)
echo "Waiting for pod $POD_NAME in namespace $NAMESPACE"
if [ -z "$POD_NAME" ]; then
echo " -> Pod not running yet, waiting $INTERVAL seconds..."
sleep $INTERVAL
ELAPSED=$((ELAPSED + INTERVAL))
fi
done
oc describe pod $POD_NAME -n $NAMESPACE || true
oc logs $POD_NAME -n $NAMESPACE || true
if [ -z "$POD_NAME" ]; then
echo " -> Timeout reached after $TIMEOUT seconds. Pod is not running."
else
echo " -> Pod is running: $POD_NAME"
fi
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh around lines 18 to 40,
POD_NAME found by the targeted oc get pods lookup inside the until loop is being
overwritten by an unconditional oc get pods call (line 34) that selects the
first pod in the namespace; remove that extra POD_NAME assignment so the script
keeps and uses the pod discovered by the label-based lookup, ensuring subsequent
describe/logs and timeout checks operate on the intended pod.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1)

32-34: Add error handling to manifest application commands.

The oc apply commands lack failure checking. If manifests are malformed or paths incorrect, the script succeeds silently, causing downstream failures to be hard to diagnose.

-oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml"
+oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml" || {
+    echo "❌ Failed to apply vLLM runtime manifest"
+    exit 1
+}
-oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml"
+oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml" || {
+    echo "❌ Failed to apply inference service manifest"
+    exit 1
+}

Alternatively, add set -e at the top of the script to fail fast on any command error.

tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1)

1-25: Add security context and resource limits (known issue).

This Pod lacks securityContext and resource management, allowing root execution and privilege escalation. Apply the same hardening as recommended for test-pod:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
     - name: lightspeed-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+        readOnlyRootFilesystem: false
+        capabilities:
+          drop:
+            - ALL
+      resources:
+        requests:
+          memory: "512Mi"
+          cpu: "250m"
+        limits:
+          memory: "1Gi"
+          cpu: "500m"
       env:

Note: readOnlyRootFilesystem is set to false since lightspeed-stack likely needs write access at runtime.

tests/e2e-prow/rhoai/run-tests.sh (1)

1-20: Add environment validation and error handling (known issue).

The script assumes LCS_IP and LLAMA_IP are set but doesn't validate them. Additionally, command failures don't propagate. Apply this diff to address previous concerns:

+#!/bin/bash
+set -euo pipefail
+trap 'echo "❌ Test script failed at line $LINENO"; exit 1' ERR
+
+# Validate required environment variables
+if [[ -z "${LCS_IP:-}" ]] || [[ -z "${LLAMA_IP:-}" ]]; then
+    echo "❌ Required environment variables not set: LCS_IP=${LCS_IP:-empty}, LLAMA_IP=${LLAMA_IP:-empty}"
+    exit 1
+fi
+
 git clone https://github.com/lightspeed-core/lightspeed-stack.git
 cd lightspeed-stack
 
 echo "pod started"
 echo $LCS_IP
 
 curl -f http://$LCS_IP:8080/v1/models || {
     echo "❌ Basic connectivity failed - showing logs before running full tests"
     exit 1
 }
 
 echo "Installing test dependencies..."
 pip install uv
-uv sync
+uv sync || { echo "❌ Failed to sync dependencies"; exit 1; }
 
 export E2E_LSC_HOSTNAME=$LCS_IP
 export E2E_LLAMA_HOSTNAME=$LLAMA_IP
 
 echo "Running comprehensive e2e test suite..."
-make test-e2e
+make test-e2e || { echo "❌ E2E tests failed with exit code $?"; exit $?; }
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1)

1-34: Add security context and resource limits (pattern across manifests).

This Pod lacks securityContext and resource management. Apply consistent hardening across all Kubernetes manifests:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
     - name: llama-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+        readOnlyRootFilesystem: false
+        capabilities:
+          drop:
+            - ALL
+      resources:
+        requests:
+          memory: "512Mi"
+          cpu: "250m"
+        limits:
+          memory: "2Gi"
+          cpu: "1000m"
       env:

Note: Adjust memory/CPU limits based on llama-stack workload requirements; may need more resources than lightspeed-stack.

tests/e2e-prow/rhoai/pipeline.sh (1)

61-62: Add error handling for pod.env sourcing and variable validation.

Line 61 sources pod.env without error checking. If pipeline-vllm.sh fails or the file is missing, $POD_NAME and $KSVC_URL remain unset, causing silent failures downstream. Add validation:

-source pod.env
-oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s
+source pod.env || { echo "❌ Failed to source pod.env from pipeline-vllm.sh"; exit 1; }
+
+# Validate critical variables from sourced file
+[[ -n "${POD_NAME:-}" ]] || { echo "❌ POD_NAME not set in pod.env"; exit 1; }
+[[ -n "${KSVC_URL:-}" ]] || { echo "❌ KSVC_URL not set in pod.env"; exit 1; }
+
+oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s

Alternatively, verify that pipeline-vllm.sh unconditionally exports these variables and document the contract.

🧹 Nitpick comments (2)
tests/e2e-prow/rhoai/configs/run.yaml (1)

66-69: Document test-only security settings.

Line 68 disables TLS verification (tls_verify: false), which is acceptable for test environments but should be documented to prevent misuse in production. Consider adding an inline comment:

   inference:
   - provider_id: vllm
     provider_type: remote::vllm
     config:
       url: ${env.KSVC_URL}/v1/
       api_token: ${env.VLLM_API_KEY}
+      # Note: TLS verification disabled for e2e testing only
       tls_verify: false
       max_tokens: 1024
tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1)

9-17: Document test-only authentication settings.

The configuration uses noop authentication and hardcoded test API key ("xyzzy"). These are appropriate for e2e testing but should be clearly marked to prevent accidental production use:

 llama_stack:
   # Uses a remote llama-stack service
   # The instance would have already been started with a llama-stack-run.yaml file
   use_as_library_client: false
   # Alternative for "as library use"
   # use_as_library_client: true
   # library_client_config_path: <path-to-llama-stack-run.yaml-file>
   url: http://${env.LLAMA_IP}:8321
-  api_key: xyzzy
+  api_key: xyzzy  # TEST VALUE ONLY - replace in production
 user_data_collection:

And at authentication section:

 authentication:
-  module: "noop"
+  module: "noop"  # No-op auth: TEST/DEV ONLY
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f27d5a and 7fae26f.

📒 Files selected for processing (20)
  • README.md (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/configs/run.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-services.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/run-tests.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh
  • README.md
  • tests/e2e-prow/rhoai/pipeline-vllm.sh
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml
  • docs/providers.md
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml
  • tests/e2e-prow/rhoai/pipeline-services.sh
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/pipeline.sh
  • tests/e2e-prow/rhoai/run-tests.sh
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml
🪛 Checkov (3.2.334)
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml

[medium] 1-25: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-25: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml

[medium] 1-30: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-30: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: build-pr

Comment on lines 1 to 30
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: e2e-rhoai-dsc
spec:
containers:
- name: test-container
env:
- name: LCS_IP
valueFrom:
secretKeyRef:
name: lcs-ip-secret
key: key
- name: LLAMA_IP
valueFrom:
secretKeyRef:
name: llama-stack-ip-secret
key: key
image: registry.access.redhat.com/ubi9/python-312
command: ["/bin/sh", "/scripts/run-tests.sh"]
volumeMounts:
- name: script-volume
mountPath: /scripts
volumes:
- name: script-volume
configMap:
name: test-script-cm
defaultMode: 0755 # Make the script executable
restartPolicy: Never No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add security context to harden the test Pod.

The Pod lacks securityContext and resource management, allowing root execution and privilege escalation. For test pods in shared environments, apply basic hardening:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
   - name: test-container
+    securityContext:
+      allowPrivilegeEscalation: false
+      readOnlyRootFilesystem: true
+      capabilities:
+        drop:
+          - ALL
+    resources:
+      requests:
+        memory: "256Mi"
+        cpu: "100m"
+      limits:
+        memory: "512Mi"
+        cpu: "500m"
     env:

Adjust resource limits based on test runtime requirements.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: e2e-rhoai-dsc
spec:
containers:
- name: test-container
env:
- name: LCS_IP
valueFrom:
secretKeyRef:
name: lcs-ip-secret
key: key
- name: LLAMA_IP
valueFrom:
secretKeyRef:
name: llama-stack-ip-secret
key: key
image: registry.access.redhat.com/ubi9/python-312
command: ["/bin/sh", "/scripts/run-tests.sh"]
volumeMounts:
- name: script-volume
mountPath: /scripts
volumes:
- name: script-volume
configMap:
name: test-script-cm
defaultMode: 0755 # Make the script executable
restartPolicy: Never
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: e2e-rhoai-dsc
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: test-container
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
env:
- name: LCS_IP
valueFrom:
secretKeyRef:
name: lcs-ip-secret
key: key
- name: LLAMA_IP
valueFrom:
secretKeyRef:
name: llama-stack-ip-secret
key: key
image: registry.access.redhat.com/ubi9/python-312
command: ["/bin/sh", "/scripts/run-tests.sh"]
volumeMounts:
- name: script-volume
mountPath: /scripts
volumes:
- name: script-volume
configMap:
name: test-script-cm
defaultMode: 0755 # Make the script executable
restartPolicy: Never
🧰 Tools
🪛 Checkov (3.2.334)

[medium] 1-30: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-30: Minimize the admission of root containers

(CKV_K8S_23)

🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml around lines 1 to 30,
the Pod lacks securityContext and resource requests/limits which permits root
execution and privilege escalation; add a Pod-level securityContext (e.g.,
runAsNonRoot: true, runAsUser: 1000) and a container-level securityContext
(allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot:
true, runAsUser: 1000, capabilities: drop: ["ALL"]) and include sensible
resource requests/limits (cpu/memory) for the test workload so the container
cannot run as root or escalate privileges and has bounded resource usage.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (3)
tests/e2e-prow/rhoai/pipeline.sh (3)

74-82: Add error handling for pod.env sourcing and validation of $POD_NAME and $KSVC_URL.

Line 81 sources pod.env without error checking. If the file is missing or malformed, $POD_NAME and $KSVC_URL will be empty or unset, causing silent failures in the wait command (line 82) and curl calls (line 97).

-source pod.env
+source pod.env || { echo "❌ Failed to source pod.env"; exit 1; }
+[[ -n "$POD_NAME" ]] && [[ -n "$KSVC_URL" ]] || { echo "❌ Missing POD_NAME or KSVC_URL"; exit 1; }

Alternatively, verify that pipeline-vllm.sh unconditionally exports these variables to a file in a way that guarantees success, then document this dependency in a comment.


181-182: Verify wait command exit code to catch timeout or readiness failures.

Line 181 waits for the test pod to be Ready with a 900s timeout, but the exit code is not checked. If the pod fails to reach Ready state or the wait times out, the script continues to the polling loop, masking the failure.

-oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s
+oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s || {
+    echo "⏰ Timeout or error waiting for test-pod readiness"
+    oc describe pod test-pod -n $NAMESPACE || true
+    exit 1
+}

1-41: ⚠️ Missing strict error handling and shebang.

The script lacks a shebang and shell safety options. Add these at the very top to prevent silent failures and invalid state propagation:

+#!/bin/bash
+set -euo pipefail
+trap 'echo "❌ Pipeline failed at line $LINENO"; exit 1' ERR
+
 #========================================
 # 1. GLOBAL CONFIG
 #========================================

Additionally, line 22–23's podman login --get-login and buildah login --get-login are status checks, not login operations. If the intent is to log in, replace with actual login commands. If only checking status, clarify the intent with a comment. Line 28's sudo podman || true is also unclear—clarify or remove. Line 39's ls -A || true appears to be debug code.

🧹 Nitpick comments (3)
tests/e2e-prow/rhoai/pipeline.sh (3)

133-135: Verify configmap source files exist before creation.

Lines 133–135 create configmaps from files (configs/run.yaml, configs/lightspeed-stack.yaml, run-tests.sh) without checking if these files exist. If missing, oc create configmap will fail with a confusing error.

+[[ -f configs/run.yaml ]] || { echo "❌ configs/run.yaml not found"; exit 1; }
+[[ -f configs/lightspeed-stack.yaml ]] || { echo "❌ configs/lightspeed-stack.yaml not found"; exit 1; }
+[[ -f run-tests.sh ]] || { echo "❌ run-tests.sh not found"; exit 1; }
+
 oc create configmap llama-stack-config -n "$NAMESPACE" --from-file=configs/run.yaml
 oc create configmap lightspeed-stack-config -n "$NAMESPACE" --from-file=configs/lightspeed-stack.yaml
 oc create configmap test-script-cm -n "$NAMESPACE" --from-file=run-tests.sh

Once set -euo pipefail is in place, oc commands will fail automatically if files are missing, but explicit checks provide clearer error messages.


157-163: Add error handling for pod labeling and service exposure.

Lines 157 and 159–163 label a pod and expose it as a service without checking for success. If the pod doesn't exist or the commands fail, the script continues with an invalid $LCS_IP value.

-oc label pod lightspeed-stack-service pod=lightspeed-stack-service -n $NAMESPACE
+oc label pod lightspeed-stack-service pod=lightspeed-stack-service -n $NAMESPACE \
+    || { echo "❌ Failed to label pod"; exit 1; }
 
-oc expose pod lightspeed-stack-service \
+oc expose pod lightspeed-stack-service \
   --name=lightspeed-stack-service-svc \
   --port=8080 \
   --type=ClusterIP \
-  -n $NAMESPACE
+  -n $NAMESPACE \
+  || { echo "❌ Failed to expose service"; exit 1; }

207-217: Add defensive checks for exit code extraction and pod state verification.

Line 207 extracts the exit code from a JSON path that assumes the pod has terminated and has a containerStatus. If the pod is in an unexpected state (e.g., not fully terminated, missing containerStatuses), the path returns empty and ${TEST_EXIT_CODE:-2} defaults to 2, masking the actual pod state.

+# Verify pod is actually in a terminal state before extracting exit code
+POD_PHASE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.phase}')
+if [[ "$POD_PHASE" != "Succeeded" && "$POD_PHASE" != "Failed" ]]; then
+    echo "❌ Pod is in phase $POD_PHASE, not a terminal state"
+    oc describe pod test-pod -n $NAMESPACE
+    exit 1
+fi
+
 TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}')
+TEST_EXIT_CODE="${TEST_EXIT_CODE:-999}"  # Use 999 to indicate missing exit code
+if [[ -z "$TEST_EXIT_CODE" || "$TEST_EXIT_CODE" == "999" ]]; then
+    echo "❌ Failed to extract exit code from pod. Pod may not have fully terminated."
+    oc describe pod test-pod -n $NAMESPACE
+    exit 1
+fi
 
 echo "===== E2E COMPLETE ====="
 
-if [ "${TEST_EXIT_CODE:-2}" -ne 0 ]; then
+if [ "$TEST_EXIT_CODE" -ne 0 ]; then
     echo "❌ E2E tests failed with exit code $TEST_EXIT_CODE (pod/test-pod failed)"
 else
     echo "✅ E2E tests succeeded"
 fi
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fae26f and d88df37.

📒 Files selected for processing (2)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/build_vllm.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/pipeline.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)

Comment on lines 35 to 29
chmod +x ./scripts/build_vllm.sh
./scripts/build_vllm.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Validate script existence and error handling before execution.

The script attempts to chmod and execute ./scripts/build_vllm.sh without checking if it exists or verifying the operation succeeded. If the file is missing or execution fails, the pipeline silently continues.

 chmod +x ./scripts/build_vllm.sh
+  [[ -f ./scripts/build_vllm.sh ]] || { echo "❌ build_vllm.sh not found"; exit 1; }
 ./scripts/build_vllm.sh
+  [[ $? -eq 0 ]] || { echo "❌ build_vllm.sh failed"; exit 1; }

Once set -euo pipefail is added to the script header (see prior comment), the explicit checks can be simplified.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
chmod +x ./scripts/build_vllm.sh
./scripts/build_vllm.sh
chmod +x ./scripts/build_vllm.sh
[[ -f ./scripts/build_vllm.sh ]] || { echo "❌ build_vllm.sh not found"; exit 1; }
./scripts/build_vllm.sh
[[ $? -eq 0 ]] || { echo "❌ build_vllm.sh failed"; exit 1; }
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/pipeline.sh around lines 35 to 36, the script
unconditionally chmods and executes ./scripts/build_vllm.sh which can silently
fail if the file is missing or execution errors; add an existence check (e.g.,
test -f or [ -x ]) before chmod/execute and fail fast with a clear error message
if the script is absent or not executable, then run chmod +x and execute only
after the checks; since set -euo pipefail is enabled earlier, keep the checks
minimal and ensure any non-zero exit from the script propagates by not
suppressing errors.

Comment on lines 1 to 2
git clone https://github.com/vllm-project/vllm.git
cd vllm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add error handling for git operations.

The script will continue even if git clone fails or if the vllm directory does not exist after cloning.

Apply this diff to add error handling:

+set -eo pipefail
+
 git clone https://github.com/vllm-project/vllm.git
-cd vllm
+cd vllm || { echo "Failed to enter vllm directory"; exit 1; }

Additionally, consider pinning a specific version rather than cloning the unstable main branch:

-git clone https://github.com/vllm-project/vllm.git
+git clone --branch v0.6.0 https://github.com/vllm-project/vllm.git
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
git clone https://github.com/vllm-project/vllm.git
cd vllm
set -eo pipefail
git clone --branch v0.6.0 https://github.com/vllm-project/vllm.git
cd vllm || { echo "Failed to enter vllm directory"; exit 1; }
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/scripts/build_vllm.sh around lines 1-2, the script
currently runs `git clone` and `cd vllm` without checking for errors; update it
to abort if `git clone` fails and to verify the repository directory exists
before changing into it. Specifically, after running git clone, check its exit
status and exit non‑zero with an explanatory message on failure; verify the vllm
directory exists (and is a git repo) before cd and exit with an error if not;
and optionally replace cloning main with a checkout of a pinned tag/commit
(clone then git checkout <tag-or-commit>) to avoid tracking unstable main.

Comment on lines 1 to 11
git clone https://github.com/vllm-project/vllm.git
cd vllm

file="docker/Dockerfile.cpu"
sed 's|\(target=[^, ]*\)|\1,z|g' "$file" > "${file}.patched"

DOCKER_BUILDKIT=1 podman build . \
# --target cpu \
--tag vllm-openai-lcore/cpu \
--file docker/Dockerfile.cpu.patched \
--build-arg max_jobs=1 No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add overall script robustness: shebang, logging, and documentation.

The script lacks a shebang, structured logging, and inline documentation explaining the build strategy (particularly the patching and ,z suffix).

Add a shebang at the top and improve logging and documentation:

+#!/bin/bash
+set -eo pipefail
+
+# Build vLLM image with CPU target and SELinux context-sharing label
+# - Clones vLLM repository
+# - Patches Dockerfile.cpu to add ,z suffix to target for SELinux labeling
+# - Builds image with max_jobs=1 to reduce memory footprint in prow environment
+
 git clone --branch v0.6.0 https://github.com/vllm-project/vllm.git
 cd vllm || { echo "Failed to enter vllm directory"; exit 1; }
 
+echo "Patching docker/Dockerfile.cpu..."
 file="docker/Dockerfile.cpu"
 # ... rest of script with added error checks

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/scripts/build_vllm.sh around lines 1 to 11, the script
is missing a shebang, basic error handling, explanatory comments for the sed
patch that appends ",z" to the Docker target, and simple/logged steps to aid
debugging; add a bash shebang (#!/usr/bin/env bash) and set strict options (set
-euo pipefail), add a brief top-of-file comment explaining the patch purpose
(why sed appends ",z" and what the patched Dockerfile is for), wrap key
operations with simple timestamped echo/log messages (cloning, patching, build
start/end, and any failures), check that docker/Dockerfile.cpu exists before
running sed and verify sed produced docker/Dockerfile.cpu.patched, and echo the
final podman/docker build command and its build-args so failures are easier to
trace.

Comment on lines 4 to 5
file="docker/Dockerfile.cpu"
sed 's|\(target=[^, ]*\)|\1,z|g' "$file" > "${file}.patched"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Verify sed patch correctness and add error handling.

The sed pattern assumes a specific Dockerfile format (target=[^, ]*). If the format differs, the patch silently fails. Additionally, no error handling exists if the file is missing or if the patch cannot be applied.

Apply this diff to add validation and error handling:

+if [[ ! -f "$file" ]]; then
+  echo "Error: $file not found"; exit 1
+fi
+
 sed 's|\(target=[^, ]*\)|\1,z|g' "$file" > "${file}.patched" || { echo "Patch failed"; exit 1; }
+
+if ! grep -q "target=.*,z" "${file}.patched"; then
+  echo "Warning: Patch may not have applied correctly (no target with ,z found)"
+fi

Also, clarify the purpose of the ,z suffix in a comment (e.g., if it's for SELinux labeling):

+# Append ,z to target for SELinux labeling (context-sharing)
 sed 's|\(target=[^, ]*\)|\1,z|g' "$file" > "${file}.patched"

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/scripts/build_vllm.sh around lines 4 to 5, the sed
command blindly patches docker/Dockerfile.cpu and appends ",z" without checking
the file exists, verifying the patch applied, or explaining the suffix; add a
guard to fail early if docker/Dockerfile.cpu is missing, run sed and check its
exit status, validate the patched file contains the expected "target=... ,z"
change (or revert/remove the patched file and exit non-zero if not), and add an
inline comment explaining the purpose of the ",z" suffix (for example SELinux
label) so the intent is clear.

Comment on lines 7 to 11
DOCKER_BUILDKIT=1 podman build . \
# --target cpu \
--tag vllm-openai-lcore/cpu \
--file docker/Dockerfile.cpu.patched \
--build-arg max_jobs=1 No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add error handling for podman build and clarify image naming.

The podman build can fail silently without feedback, and the image tag lacks a registry prefix which may cause issues when pushing to a registry (e.g., quay.io).

Apply this diff to add error handling and update the image name:

-DOCKER_BUILDKIT=1 podman build . \
+DOCKER_BUILDKIT=1 podman build . || { echo "Build failed"; exit 1; }
 #    --target cpu \                              
     --tag vllm-openai-lcore/cpu \
     --file docker/Dockerfile.cpu.patched \
     --build-arg max_jobs=1

Update the image tag to include a registry prefix (adjust as needed for your infrastructure):

-    --tag vllm-openai-lcore/cpu \
+    --tag quay.io/vllm-openai-lcore/cpu \

Also, document why --target cpu is commented out and why max_jobs=1 is used:

-#    --target cpu \                              
+# NOTE: target cpu is handled implicitly; max_jobs=1 to reduce memory usage in prow
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DOCKER_BUILDKIT=1 podman build . \
# --target cpu \
--tag vllm-openai-lcore/cpu \
--file docker/Dockerfile.cpu.patched \
--build-arg max_jobs=1
DOCKER_BUILDKIT=1 podman build . \
# NOTE: target cpu is handled implicitly; max_jobs=1 to reduce memory usage in prow
--tag quay.io/vllm-openai-lcore/cpu \
--file docker/Dockerfile.cpu.patched \
--build-arg max_jobs=1 || { echo "Build failed"; exit 1; }
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/scripts/build_vllm.sh around lines 7 to 11, the podman
build may fail silently and the image tag lacks a registry prefix; update the
script to run the build command so failures are detected and cause the script to
exit non‑zero (check the podman build exit status and exit with an error message
if it fails), change the --tag value to include your registry/org prefix (e.g.,
quay.io/<your-org>/vllm-openai-lcore:cpu or similar) to avoid push/name
confusion, and add brief inline comments explaining why --target cpu is
commented out and why --build-arg max_jobs=1 is enforced so future maintainers
understand the choices.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
tests/e2e-prow/rhoai/scripts/build_vllm.sh (1)

1-11: Unresolved critical issues: add shebang, error handling, and validation.

Multiple critical issues from previous reviews remain unaddressed. The script lacks proper error handling for git operations, sed patching, and podman build, which could cause silent failures in the prow pipeline. Additionally, the image tag lacks a registry prefix.

Apply this consolidated diff to address all critical issues:

+#!/bin/bash
+set -eo pipefail
+
+# Build vLLM image with CPU target and SELinux context-sharing label
+# - Clones vLLM repository at a pinned version
+# - Patches docker/Dockerfile.cpu to append ,z suffix for SELinux labeling
+# - Builds image with max_jobs=1 to reduce memory footprint in prow environment
+
-git clone https://github.com/vllm-project/vllm.git
-cd vllm
+git clone --branch v0.6.0 https://github.com/vllm-project/vllm.git || { echo "Failed to clone vllm repository"; exit 1; }
+cd vllm || { echo "Failed to enter vllm directory"; exit 1; }
 
+echo "Patching docker/Dockerfile.cpu..."
 file="docker/Dockerfile.cpu"
-sed 's|\(target=[^, ]*\)|\1,z|g' "$file" > "${file}.patched"
+
+if [[ ! -f "$file" ]]; then
+  echo "Error: $file not found"; exit 1
+fi
+
+sed 's|\(target=[^, ]*\)|\1,z|g' "$file" > "${file}.patched" || { echo "Patch failed"; exit 1; }
+
+if ! grep -q "target=.*,z" "${file}.patched"; then
+  echo "Warning: Patch may not have applied correctly (no target with ,z found)"
+fi
 
+echo "Building vLLM CPU image..."
 DOCKER_BUILDKIT=1 podman build . \
-#    --target cpu \                              
+#    --target cpu \  # target cpu is handled implicitly via Dockerfile.cpu
-    --tag vllm-openai-lcore/cpu \
+    --tag quay.io/vllm-openai-lcore/cpu \
     --file docker/Dockerfile.cpu.patched \
-    --build-arg max_jobs=1
+    --build-arg max_jobs=1 || { echo "Build failed"; exit 1; }
+
+echo "vLLM image built successfully"
tests/e2e-prow/rhoai/pipeline.sh (5)

1-10: Enable strict error handling at script start.

For robustness in complex orchestration, add safety options before any logic:

+#!/bin/bash
+set -euo pipefail
+trap 'echo "❌ Pipeline failed at line $LINENO"; exit 1' ERR
+
 #========================================
 # 1. GLOBAL CONFIG
 #========================================

Without this, unhandled command failures allow the script to proceed with invalid state, and subsequent checks (e.g., variable validation) become brittle. This is a prerequisite for the fixes below.


37-38: Validate script existence and error handling for build_vllm.sh.

Check the file exists before chmod/execute and fail immediately on error:

 chmod +x ./scripts/build_vllm.sh
+[[ -f ./scripts/build_vllm.sh ]] || { echo "❌ build_vllm.sh not found"; exit 1; }
 ./scripts/build_vllm.sh
+[[ $? -eq 0 ]] || { echo "❌ build_vllm.sh failed with exit code $?"; exit 1; }

Alternatively, once set -euo pipefail is in place (see prior comment), the explicit exit code check can be simplified.


65-66: Add error handling for curl download.

Validate the download succeeds and the file exists:

 curl -sL -o tool_chat_template_llama3.2_json.jinja \
-    https://raw.githubusercontent.com/vllm-project/vllm/main/examples/tool_chat_template_llama3.2_json.jinja
+    https://raw.githubusercontent.com/vllm-project/vllm/main/examples/tool_chat_template_llama3.2_json.jinja \
+    || { echo "❌ Failed to download jinja template"; exit 1; }
+
+[[ -f tool_chat_template_llama3.2_json.jinja ]] \
+    || { echo "❌ Jinja template file not found after download"; exit 1; }

This prevents the script from proceeding to configmap creation with a missing file.


83-84: Add error handling for pod.env sourcing and validate required variables.

After sourcing pod.env, verify it succeeded and required variables are set:

-source pod.env
+source pod.env || { echo "❌ Failed to source pod.env"; exit 1; }
+[[ -n "$POD_NAME" ]] || { echo "❌ POD_NAME is unset after sourcing pod.env"; exit 1; }
+[[ -n "$KSVC_URL" ]] || { echo "❌ KSVC_URL is unset after sourcing pod.env"; exit 1; }
+
 oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s

This ensures that unset or empty variables fail fast rather than causing cryptic failures downstream.


183-183: Verify wait command exit code to catch timeout or pod failures.

Add explicit exit code checking for the wait command:

-oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s
+oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s || {
+    echo "⏰ Timeout or error waiting for test-pod readiness"
+    oc describe pod test-pod -n $NAMESPACE || true
+    exit 1
+}

This prevents the script from proceeding with the polling loop if the pod fails to reach Ready state within 900s.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d88df37 and 27f61cc.

📒 Files selected for processing (2)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/build_vllm.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/pipeline.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
tests/e2e-prow/rhoai/pipeline.sh (5)

1-26: Add shell safety options and validate build_vllm.sh execution.

The script lacks foundational error-handling options and doesn't validate script execution, allowing failures to propagate silently throughout the pipeline orchestration.

Apply this diff to add shell safety and fix build_vllm.sh validation:

 #========================================
 # 1. GLOBAL CONFIG
 #========================================
+#!/bin/bash
+set -euo pipefail
+trap 'echo "❌ Pipeline failed at line $LINENO"; exit 1' ERR
+
 NAMESPACE="e2e-rhoai-dsc"
 MODEL_NAME="meta-llama/Llama-3.2-1B-Instruct"
 
 
 #========================================
 # 2. ENVIRONMENT SETUP
 #========================================
 echo "===== Setting up environment variables ====="
 export HUGGING_FACE_HUB_TOKEN=$(cat /var/run/huggingface/hf-token-ces-lcore-test || true)
 export VLLM_API_KEY=$(cat /var/run/vllm/vllm-api-key-lcore-test || true)
 
 [[ -n "$HUGGING_FACE_HUB_TOKEN" ]] && echo "✅ HUGGING_FACE_HUB_TOKEN is set" || { echo "❌ Missing HUGGING_FACE_HUB_TOKEN"; exit 1; }
 [[ -n "$VLLM_API_KEY" ]] && echo "✅ VLLM_API_KEY is set" || { echo "❌ Missing VLLM_API_KEY"; exit 1; }
 
 
-chmod +x ./scripts/build_vllm.sh
-./scripts/build_vllm.sh
+[[ -f ./scripts/build_vllm.sh ]] || { echo "❌ build_vllm.sh not found"; exit 1; }
+chmod +x ./scripts/build_vllm.sh
+./scripts/build_vllm.sh || { echo "❌ build_vllm.sh failed with exit code $?"; exit 1; }

With set -euo pipefail in place, subsequent command failures will automatically exit the script unless explicitly handled.


47-51: Add error handling for curl download and file validation.

The curl download has no error checking; network failures or URL issues cause silent file absence, leading to cryptic configmap creation errors downstream.

Apply this diff to add error checking:

 curl -sL -o tool_chat_template_llama3.2_json.jinja \
-    https://raw.githubusercontent.com/vllm-project/vllm/main/examples/tool_chat_template_llama3.2_json.jinja
+    https://raw.githubusercontent.com/vllm-project/vllm/main/examples/tool_chat_template_llama3.2_json.jinja \
+    || { echo "❌ Failed to download jinja template"; exit 1; }
+
+[[ -f tool_chat_template_llama3.2_json.jinja ]] \
+    || { echo "❌ Jinja template file not found after download"; exit 1; }
 
 oc create configmap vllm-chat-template -n "$NAMESPACE" \
     --from-file=tool_chat_template_llama3.2_json.jinja --dry-run=client -o yaml | oc apply -f -

Alternatively, once set -euo pipefail is in place, rely on the error trap for the curl command itself, but keep the file existence check.


65-66: Add error handling for pod.env sourcing and variable validation.

Line 65 sources pod.env without checking if the source succeeded, and subsequent code (lines 66, 81) uses $POD_NAME and $KSVC_URL without validation. If the file is missing or malformed, these variables will be empty or unset, causing silent failures in the wait command and curl calls.

Apply this diff to add validation:

-source pod.env
+source pod.env || { echo "❌ Failed to source pod.env"; exit 1; }
+[[ -n "$POD_NAME" ]] && [[ -n "$KSVC_URL" ]] || { echo "❌ Missing POD_NAME or KSVC_URL in pod.env"; exit 1; }

Ensure that pipeline-vllm.sh unconditionally exports POD_NAME and KSVC_URL to pod.env and exits non-zero on failure, so sourcing here cannot silently produce unset values.


66-66: Check exit code of oc wait for pod readiness.

The oc wait command doesn't check for success. If the pod fails to reach Ready within the 300s timeout, the script proceeds with an unready pod, causing cryptic failures in the API test loop.

Apply this diff:

-oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s
+oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s || {
+    echo "❌ Timeout or error waiting for $POD_NAME readiness"
+    oc describe pod $POD_NAME -n $NAMESPACE || true
+    exit 1
+}

165-165: Check exit code of oc wait for test-pod readiness and add timeout context.

The oc wait command doesn't validate success. If the test pod fails to reach Ready, the subsequent polling loop will still execute but may produce confusing results.

Apply this diff:

-oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s
+oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s || {
+    echo "⏰ Timeout or error waiting for test-pod readiness"
+    oc describe pod test-pod -n $NAMESPACE || true
+    exit 1
+}

This prevents the polling loop from running on a pod that failed to become Ready.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 095e688 and 0a2332e.

📒 Files selected for processing (1)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/pipeline.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: build-pr
🔇 Additional comments (3)
tests/e2e-prow/rhoai/pipeline.sh (3)

58-58: Validate pipeline-vllm.sh execution.

The pipeline-vllm.sh script is executed without error checking. Once set -euo pipefail is enabled at the script header, this will automatically fail on non-zero exit; however, ensure the trap handler provides sufficient context.


121-121: Validate pipeline-services.sh execution.

The pipeline-services.sh script is executed without error checking. Once set -euo pipefail is enabled, this will automatically fail on non-zero exit; ensure the error trap provides context about which service deployment failed.


159-159: Validate pipeline-test-pod.sh execution.

The pipeline-test-pod.sh script is executed without error checking. Once set -euo pipefail is enabled, this will automatically fail on non-zero exit; ensure the error trap and any pod creation failures are logged clearly.

Comment on lines +123 to +128
oc wait pod/lightspeed-stack-service pod/llama-stack-service \
-n "$NAMESPACE" --for=condition=Ready --timeout=300s
sleep 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Check exit code of oc wait for services and add timeout protection.

The oc wait command for lightspeed-stack-service and llama-stack-service doesn't validate success. If either pod fails to become Ready, the script continues with incomplete deployments, causing failures downstream.

Apply this diff:

 ./pipeline-services.sh
 
-oc wait pod/lightspeed-stack-service pod/llama-stack-service \
-    -n "$NAMESPACE" --for=condition=Ready --timeout=300s
+oc wait pod/lightspeed-stack-service pod/llama-stack-service \
+    -n "$NAMESPACE" --for=condition=Ready --timeout=300s || {
+    echo "❌ Timeout or error waiting for service pods"
+    oc get pods -n "$NAMESPACE" -o wide || true
+    oc describe pod lightspeed-stack-service -n "$NAMESPACE" || true
+    oc describe pod llama-stack-service -n "$NAMESPACE" || true
+    exit 1
+}
 sleep 30
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/pipeline.sh around lines 123 to 125, the oc wait
invocation and subsequent sleep do not verify that the pods actually became
Ready and lack protection if oc wait hangs or fails; modify the script to run oc
wait under a timeout (or use the existing --timeout but also wrap with a
process-level timeout command), capture its exit code, and if non-zero emit a
clear error message and exit the script with a non-zero status; only proceed to
sleep and subsequent steps when oc wait succeeds.

Comment on lines +191 to +204
TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}')

echo "===== E2E COMPLETE ====="

if [ "${TEST_EXIT_CODE:-2}" -ne 0 ]; then
echo "❌ E2E tests failed with exit code $TEST_EXIT_CODE (pod/test-pod failed)"
else
echo "✅ E2E tests succeeded"
fi

exit $TEST_EXIT_CODE No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate test pod exit code extraction and handle missing/incomplete state.

Line 191 extracts the exit code from the test pod, but if the pod is still running, in an error state, or containerStatuses is empty, the jsonpath will return an empty string. Line 195 then compares this against 0, which silently treats missing exit codes as failures (via the fallback to 2).

Apply this diff to add explicit validation:

-TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}')
+TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}' 2>/dev/null || echo "")
+
+if [[ -z "$TEST_EXIT_CODE" ]]; then
+    echo "❌ Failed to extract test pod exit code (pod may not have terminated)"
+    oc describe pod test-pod -n $NAMESPACE || true
+    exit 1
+fi
 
 echo "===== E2E COMPLETE ====="
 
-if [ "${TEST_EXIT_CODE:-2}" -ne 0 ]; then
+if [[ "$TEST_EXIT_CODE" -ne 0 ]]; then
     echo "❌ E2E tests failed with exit code $TEST_EXIT_CODE (pod/test-pod failed)"
 else
     echo "✅ E2E tests succeeded"
 fi
 
 exit $TEST_EXIT_CODE

This ensures that incomplete or missing exit codes are caught explicitly rather than silently treated as failures.

🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/pipeline.sh around lines 191 to 201, the script blindly
extracts the pod container terminated exitCode which can be empty if the pod is
still running, in an error state, or has no containerStatuses; update the logic
to explicitly validate that the containerStatuses array exists and that the
first container has a .state.terminated.exitCode before using it by: query the
pod JSON to check .status.phase and .status.containerStatuses length and the
presence of .state.terminated, emit a clear error and set a distinct non-zero
exit code (and fail fast) when those fields are missing or the pod hasn’t
terminated, otherwise assign TEST_EXIT_CODE to the extracted integer and proceed
with the existing success/failure message and exit with that code.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (12)
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1)

1-25: Add security hardening: securityContext and resource limits.

The Pod lacks fundamental security controls and resource management:

  1. No securityContext — container runs as root with privilege escalation allowed
  2. No resource requests/limits — can consume unbounded resources, affecting cluster stability

Apply this diff to add hardening (adjust resource values based on observed memory footprint):

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
     - name: lightspeed-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+        readOnlyRootFilesystem: false
+      resources:
+        requests:
+          memory: "512Mi"
+          cpu: "250m"
+        limits:
+          memory: "1Gi"
+          cpu: "500m"
       env:
tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1)

14-62: Add resource requests and limits to prevent cluster resource exhaustion.

The vLLM container lacks resources.requests and resources.limits. On a CPU-only prow node, vLLM can consume unbounded resources, evicting other pods or making the node unschedulable. Add conservative resource bounds:

Apply this diff to add resource management (adjust based on 1B model's actual footprint):

     - args:
         - --model
         # - /mnt/models/
         - meta-llama/Llama-3.2-1B-Instruct
         - --enable-auto-tool-choice
         - --tool-call-parser 
         - llama3_json
         - --chat-template 
         - /mnt/chat-template/tool_chat_template_llama3.2_json.jinja
         - --download-dir
         - /tmp/models-cache
         - --port
         - "8080"
         - --max-model-len
         - "2048"
       image: quay.io/rh-ee-cpompeia/vllm-cpu:latest
       name: kserve-container
+      resources:
+        requests:
+          cpu: "2"
+          memory: "3Gi"
+        limits:
+          cpu: "4"
+          memory: "6Gi"
       env:
tests/e2e-prow/rhoai/run-tests.sh (1)

1-20: Add environment validation and explicit error handling throughout script.

This script is missing critical safeguards noted in prior review:

  1. No validation that LCS_IP and LLAMA_IP are set (leads to cryptic failures)
  2. make test-e2e failure doesn't explicitly exit with error code
  3. No shell safety options (set -euo pipefail) to catch unhandled errors

Errors in e2e tests must propagate to the prow job failure path.

Apply this diff to add validation, error handling, and safety:

+#!/bin/bash
+set -euo pipefail
+
+# Validate required environment variables
+if [[ -z "${LCS_IP:-}" ]] || [[ -z "${LLAMA_IP:-}" ]]; then
+    echo "❌ Error: Required environment variables not set."
+    echo "   LCS_IP=${LCS_IP:-<not set>}"
+    echo "   LLAMA_IP=${LLAMA_IP:-<not set>}"
+    exit 1
+fi
+
 git clone https://github.com/lightspeed-core/lightspeed-stack.git
 cd lightspeed-stack
 
 echo "pod started"
 echo $LCS_IP
 
 curl -f http://$LCS_IP:8080/v1/models || {
     echo "❌ Basic connectivity failed - showing logs before running full tests"
     exit 1
 }
 
 echo "Installing test dependencies..."
 pip install uv
-uv sync
+uv sync || { echo "❌ Failed to sync dependencies"; exit 1; }
 
 export E2E_LSC_HOSTNAME=$LCS_IP
 export E2E_LLAMA_HOSTNAME=$LLAMA_IP
 
 echo "Running comprehensive e2e test suite..."
-make test-e2e
+make test-e2e || { echo "❌ E2E tests failed"; exit 1; }
tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1)

1-30: Add security hardening: securityContext and resource limits.

This test Pod lacks security controls allowing root execution with privilege escalation:

  1. No securityContext — container runs as root
  2. No resource requests/limits — unbounded resource usage in test environment
  3. Root filesystem not read-only — unnecessary write access

Apply this diff to harden the Pod:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
   - name: test-container
+    securityContext:
+      allowPrivilegeEscalation: false
+      readOnlyRootFilesystem: true
+      capabilities:
+        drop:
+          - ALL
+    resources:
+      requests:
+        memory: "256Mi"
+        cpu: "100m"
+      limits:
+        memory: "512Mi"
+        cpu: "500m"
     env:
tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1)

32-34: Add error handling to manifest application commands.

The oc apply commands lack failure checks. If manifests are malformed or paths incorrect, the script succeeds silently.

-oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml"
+oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml" || {
+    echo "❌ Failed to apply vLLM runtime manifest"
+    exit 1
+}
 
-oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml"
+oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml" || {
+    echo "❌ Failed to apply InferenceService manifest"
+    exit 1
+}
tests/e2e-prow/rhoai/pipeline.sh (7)

1-10: Enable strict error handling at script start.

The script lacks shell safety options (set -euo pipefail and an error trap). This allows unhandled failures to cascade through the orchestration pipeline.

+#!/bin/bash
+set -euo pipefail
+trap 'echo "❌ Pipeline failed at line $LINENO"; exit 1' ERR
+
 #========================================
 # 1. GLOBAL CONFIG
 #========================================
 NAMESPACE="e2e-rhoai-dsc"

Once this is in place, subsequent explicit error checks can be simplified by relying on automatic exit on non-zero status.


43-47: Add error handling for curl download and file existence check.

The jinja template download (lines 43–44) lacks error handling. A failed network call would leave no file, causing the configmap creation on line 46 to fail with a cryptic error.

 curl -sL -o tool_chat_template_llama3.2_json.jinja \
-    https://raw.githubusercontent.com/vllm-project/vllm/main/examples/tool_chat_template_llama3.2_json.jinja
+    https://raw.githubusercontent.com/vllm-project/vllm/main/examples/tool_chat_template_llama3.2_json.jinja \
+    || { echo "❌ Failed to download jinja template"; exit 1; }
+
+[[ -f tool_chat_template_llama3.2_json.jinja ]] \
+    || { echo "❌ Jinja template file not found after download"; exit 1; }
 
 oc create configmap vllm-chat-template -n "$NAMESPACE" \

54-54: Validate script existence and check execution status of pipeline-vllm.sh.

The script attempts to execute ./pipeline-vllm.sh without verifying it exists or checking if it succeeded. Silent failures would allow the pipeline to proceed with incomplete setup.

+[[ -f ./pipeline-vllm.sh ]] || { echo "❌ pipeline-vllm.sh not found"; exit 1; }
 ./pipeline-vllm.sh
+[[ $? -eq 0 ]] || { echo "❌ pipeline-vllm.sh failed"; exit 1; }

With set -euo pipefail added at the script start, this simplifies to relying on automatic exit on error.


61-62: Validate source pod.env and oc wait exit codes.

Line 61 sources pod.env without error checking; if missing or malformed, $POD_NAME and $KSVC_URL remain unset. Line 62's oc wait doesn't validate success, so the script proceeds even if the pod fails to become Ready.

-source pod.env
-oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s
+source pod.env || { echo "❌ Failed to source pod.env"; exit 1; }
+[[ -n "$POD_NAME" ]] && [[ -n "$KSVC_URL" ]] || { echo "❌ Missing POD_NAME or KSVC_URL"; exit 1; }
+
+oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s || {
+    echo "❌ Timeout or error waiting for pod readiness"
+    oc describe pod "$POD_NAME" -n "$NAMESPACE" || true
+    exit 1
+}

119-121: Check exit code of oc wait for service pods and add diagnostic output on failure.

The wait command for lightspeed-stack-service and llama-stack-service (lines 119–120) doesn't validate success. If either pod fails to reach Ready, the script continues with incomplete deployments.

 ./pipeline-services.sh
 
 oc wait pod/lightspeed-stack-service pod/llama-stack-service \
-    -n "$NAMESPACE" --for=condition=Ready --timeout=300s
+    -n "$NAMESPACE" --for=condition=Ready --timeout=300s || {
+    echo "❌ Timeout or error waiting for service pods"
+    oc get pods -n "$NAMESPACE" -o wide || true
+    oc describe pod lightspeed-stack-service -n "$NAMESPACE" || true
+    oc describe pod llama-stack-service -n "$NAMESPACE" || true
+    exit 1
+}
 sleep 30

161-161: Verify oc wait exit code for test-pod readiness.

Line 161's oc wait command doesn't check for failure. If the pod times out or fails to become Ready, the script continues with an incomplete test pod, causing cryptic errors downstream.

 # Wait until tests are complete
-oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s
+oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s || {
+    echo "⏰ Timeout or error waiting for test-pod readiness"
+    oc describe pod test-pod -n $NAMESPACE || true
+    exit 1
+}
 
 start_time=$(date +%s)

187-197: Validate test pod exit code extraction and handle incomplete/missing state.

Line 187 extracts the exit code via jsonpath, but if the pod hasn't terminated or has no containerStatuses, the result is empty. Line 191's ${TEST_EXIT_CODE:-2} fallback silently masks these states, treating missing exit codes as failures without clear diagnostics.

-TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}')
+TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}' 2>/dev/null || echo "")
+
+if [[ -z "$TEST_EXIT_CODE" ]]; then
+    echo "❌ Failed to extract test pod exit code (pod may not have terminated)"
+    oc describe pod test-pod -n $NAMESPACE || true
+    exit 1
+fi
 
 echo "===== E2E COMPLETE ====="
 
-if [ "${TEST_EXIT_CODE:-2}" -ne 0 ]; then
+if [[ "$TEST_EXIT_CODE" -ne 0 ]]; then
     echo "❌ E2E tests failed with exit code $TEST_EXIT_CODE (pod/test-pod failed)"
 else
     echo "✅ E2E tests succeeded"
 fi
🧹 Nitpick comments (1)
tests/e2e-prow/rhoai/pipeline-vllm.sh (1)

1-7: Add shell safety options for fail-fast behavior.

The orchestration script should stop execution if any called script fails. Add set -euo pipefail for robustness:

 #!/bin/bash
+set -euo pipefail
 
 PIPELINE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a2332e and 9a15619.

📒 Files selected for processing (20)
  • README.md (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/configs/run.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-services.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/run-tests.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh
  • tests/e2e-prow/rhoai/configs/run.yaml
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml
  • docs/providers.md
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml
  • tests/e2e-prow/rhoai/pipeline-services.sh
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/run-tests.sh
🪛 Checkov (3.2.334)
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml

[medium] 1-25: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-25: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml

[medium] 1-30: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-30: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
🔇 Additional comments (2)
README.md (1)

127-127: Clear and well-documented addition.

The RHOAI (vLLM) provider row is correctly formatted and includes a helpful example link to the new e2e configuration.

tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1)

1-35: Operator subscriptions are correctly configured.

The three operator subscriptions (Service Mesh, Serverless, RHODS) are properly formatted, reference stable channels, and target the correct namespace and source.

Comment on lines +1 to +34
apiVersion: v1
kind: Pod
metadata:
name: llama-stack-service
namespace: e2e-rhoai-dsc
spec:
containers:
- name: llama-stack-container
env:
- name: KSVC_URL
valueFrom:
secretKeyRef:
name: api-url-secret
key: key
- name: VLLM_API_KEY
valueFrom:
secretKeyRef:
name: vllm-api-key-secret
key: key
image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
ports:
- containerPort: 8321
volumeMounts:
- name: app-root
mountPath: /opt/app-root/src/.llama
- name: config
mountPath: /opt/app-root/run.yaml
subPath: run.yaml
volumes:
- name: app-root
emptyDir: {}
- name: config
configMap:
name: llama-stack-config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add security hardening: securityContext and resource limits.

This Pod lacks security controls and resource management like other manifests in the PR:

  1. No securityContext — container runs as root
  2. No resource requests/limits — unbounded resource usage

Apply this diff to add hardening:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
     - name: llama-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+      resources:
+        requests:
+          memory: "2Gi"
+          cpu: "500m"
+        limits:
+          memory: "4Gi"
+          cpu: "2"
       env:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
apiVersion: v1
kind: Pod
metadata:
name: llama-stack-service
namespace: e2e-rhoai-dsc
spec:
containers:
- name: llama-stack-container
env:
- name: KSVC_URL
valueFrom:
secretKeyRef:
name: api-url-secret
key: key
- name: VLLM_API_KEY
valueFrom:
secretKeyRef:
name: vllm-api-key-secret
key: key
image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
ports:
- containerPort: 8321
volumeMounts:
- name: app-root
mountPath: /opt/app-root/src/.llama
- name: config
mountPath: /opt/app-root/run.yaml
subPath: run.yaml
volumes:
- name: app-root
emptyDir: {}
- name: config
configMap:
name: llama-stack-config
apiVersion: v1
kind: Pod
metadata:
name: llama-stack-service
namespace: e2e-rhoai-dsc
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: llama-stack-container
securityContext:
allowPrivilegeEscalation: false
resources:
requests:
memory: "2Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2"
env:
- name: KSVC_URL
valueFrom:
secretKeyRef:
name: api-url-secret
key: key
- name: VLLM_API_KEY
valueFrom:
secretKeyRef:
name: vllm-api-key-secret
key: key
image: quay.io/opendatahub/llama-stack:rhoai-v2.25-latest
ports:
- containerPort: 8321
volumeMounts:
- name: app-root
mountPath: /opt/app-root/src/.llama
- name: config
mountPath: /opt/app-root/run.yaml
subPath: run.yaml
volumes:
- name: app-root
emptyDir: {}
- name: config
configMap:
name: llama-stack-config
🧰 Tools
🪛 Checkov (3.2.334)

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml lines 1-34, the
Pod is missing security hardening and resource management; add a pod-level
securityContext (e.g., runAsNonRoot: true, runAsUser: 1000, fsGroup: 1000) and a
container-level securityContext (allowPrivilegeEscalation: false, privileged:
false, capabilities: drop: ["ALL"], and keep readOnlyRootFilesystem appropriate
for the mounted volume), and add resources.requests and resources.limits for CPU
and memory (example requests cpu: "100m", memory: "128Mi"; limits cpu: "500m",
memory: "1Gi") to the container spec so the Pod does not run as root and has
bounded resources.

Comment on lines 1 to 65
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
labels:
opendatahub.io/dashboard: "true"
metadata:
annotations:
openshift.io/display-name: vLLM
name: vllm
namespace: e2e-rhoai-dsc
spec:
builtInAdapter:
modelLoadingTimeoutMillis: 90000
containers:
- args:
- --model
# - /mnt/models/
- meta-llama/Llama-3.2-1B-Instruct
- --enable-auto-tool-choice
- --tool-call-parser
- llama3_json
- --chat-template
- /mnt/chat-template/tool_chat_template_llama3.2_json.jinja
- --download-dir
- /tmp/models-cache
- --port
- "8080"
- --max-model-len
- "2048"
image: quay.io/rh-ee-cpompeia/vllm-cpu:latest
name: kserve-container
env:
- name: HF_HOME
value: /mnt/models-cache/hf_home
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
- name: VLLM_API_KEY
valueFrom:
secretKeyRef:
name: vllm-api-key-secret
key: key
ports:
- containerPort: 8080
name: http1
protocol: TCP
volumeMounts:
- name: chat-template
mountPath: /mnt/chat-template
- name: models-cache
mountPath: /mnt/models-cache
volumes:
- name: chat-template
configMap:
name: vllm-chat-template
- name: models-cache
emptyDir: {}
multiModel: false
supportedModelFormats:
- autoSelect: true
name: pytorch No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add security hardening: securityContext for container.

Like other Pod manifests in this PR, the ServingRuntime container lacks a security context:

Apply this diff to add security hardening:

       image: quay.io/rh-ee-cpompeia/vllm-cpu:latest
       name: kserve-container
+      securityContext:
+        allowPrivilegeEscalation: false
       resources:

Committable suggestion skipped: line range outside the PR's diff.


# Wait for rollout to complete
echo "Waiting for kserve-controller-manager rollout..."
oc rollout status deployment/kserve-controller-manager -n redhat-ods-applications --timeout=300s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Check exit code of oc rollout status command.

The rollout status on line 23 may timeout or fail, but the script continues regardless. Add error handling:

 echo "Waiting for kserve-controller-manager rollout..."
-oc rollout status deployment/kserve-controller-manager -n redhat-ods-applications --timeout=300s
+oc rollout status deployment/kserve-controller-manager -n redhat-ods-applications --timeout=300s || {
+    echo "❌ Rollout failed or timed out"
+    exit 1
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
oc rollout status deployment/kserve-controller-manager -n redhat-ods-applications --timeout=300s
echo "Waiting for kserve-controller-manager rollout..."
oc rollout status deployment/kserve-controller-manager -n redhat-ods-applications --timeout=300s || {
echo "❌ Rollout failed or timed out"
exit 1
}
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/scripts/deploy-vllm.sh around line 23, the oc rollout
status command's exit code is ignored so the script will continue even if
rollout times out or fails; modify the script to check the command's exit status
immediately after running it and, on non-zero exit, print a clear error message
and exit the script with a non-zero status (or implement a small retry
loop/backoff before failing) so failures stop the CI flow.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1)

1-30: Add security context and resource limits to harden the test Pod.

This comment duplicates a previous review observation: the Pod lacks securityContext and resource constraints, allowing root execution and privilege escalation. For test pods in shared prow environments, apply basic hardening as suggested in the prior review.

Apply this diff to address the security gaps:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
   - name: test-container
+    securityContext:
+      allowPrivilegeEscalation: false
+      readOnlyRootFilesystem: true
+      capabilities:
+        drop:
+          - ALL
+    resources:
+      requests:
+        memory: "256Mi"
+        cpu: "100m"
+      limits:
+        memory: "512Mi"
+        cpu: "500m"
     env:

Adjust memory/CPU limits based on your test runtime requirements.

🧹 Nitpick comments (4)
tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1)

15-17: Consider sourcing the API key from environment instead of hardcoding.

Line 17 uses a hardcoded placeholder API key xyzzy. For test environments, sourcing it from an environment variable would be more flexible and align with the pattern used for LLAMA_IP.

  url: http://${env.LLAMA_IP}:8321
- api_key: ${env.LLAMA_STACK_API_KEY}

This also reduces the risk of accidentally exposing the pattern in documentation.

tests/e2e-prow/rhoai/scripts/bootstrap.sh (1)

5-5: Add validation for required BASE_DIR parameter.

The script accepts BASE_DIR without checking if it was provided or if it exists. This could cause confusing failures later. Consider adding validation:

BASE_DIR="${1:?ERROR: BASE_DIR is required as first argument}"
if [[ ! -d "$BASE_DIR" ]]; then
  echo "ERROR: BASE_DIR does not exist: $BASE_DIR" >&2
  exit 1
fi
tests/e2e-prow/rhoai/configs/run.yaml (1)

63-69: Add clarifying comments for test-specific settings.

The vLLM inference provider configuration includes test-specific settings like tls_verify: false. A comment would clarify this is for testing and prevent accidental use in production configurations:

  inference:
  - provider_id: vllm
    provider_type: remote::vllm
    config:
      url: ${env.KSVC_URL}/v1/
      api_token: ${env.VLLM_API_KEY}
      tls_verify: false  # Test environment only - do not use in production
      max_tokens: 1024
docs/providers.md (1)

64-64: Use consistent capitalization for vLLM across documentation.

The entry uses lowercase vllm here, but README.md (line 127) uses uppercase vLLM. For consistency, both should use the same capitalization. The standard capitalization is vLLM.

- | RHOAI (vllm) | latest operator | remote | `openai` | ✅ |
+ | RHOAI (vLLM) | latest operator | remote | `openai` | ✅ |
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a15619 and 8998a0c.

📒 Files selected for processing (20)
  • README.md (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/configs/run.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-services.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/run-tests.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh
  • tests/e2e-prow/rhoai/run-tests.sh
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh
  • tests/e2e-prow/rhoai/pipeline-vllm.sh
  • tests/e2e-prow/rhoai/pipeline.sh
  • tests/e2e-prow/rhoai/pipeline-services.sh
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml
🪛 Checkov (3.2.334)
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml

[medium] 1-30: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-30: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml

[medium] 1-25: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-25: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (4)
README.md (1)

127-127: Documentation addition is clear and consistent.

The new RHOAI (vLLM) provider entry follows the existing table format, includes appropriate metadata, and links to the new example configuration. This aligns well with the supporting documentation updates.

tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1)

6-17: Add security context and resource limits to meet Kubernetes best practices.

The Pod specification lacks security hardening and resource management controls. For test environments in shared CI/CD clusters, these are important:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
   containers:
     - name: lightspeed-stack-container
+      securityContext:
+        allowPrivilegeEscalance: false
+      resources:
+        requests:
+          memory: "512Mi"
+          cpu: "250m"
+        limits:
+          memory: "1Gi"
+          cpu: "500m"
       env:

Was the security hardening intentionally deferred? If not, please apply these changes to meet K8s security standards.

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1)

6-22: Add security hardening: securityContext and resource limits.

This Pod lacks security controls and resource management. Even for ephemeral test environments, these controls prevent privilege escalation and resource exhaustion:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
     - name: llama-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+      resources:
+        requests:
+          memory: "2Gi"
+          cpu: "500m"
+        limits:
+          memory: "4Gi"
+          cpu: "2"
       env:

Adjust resource values based on actual performance requirements observed during testing.

Please apply security hardening and define resource requests/limits appropriate for your workload.

tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1)

1-17: Well-focused DataScienceCluster configuration for e2e testing.

The manifest appropriately enables only the required components (kserve for vLLM serving) and service mesh for routing, while removing unnecessary components (workbenches, dashboard, pipelines) for an inference-focused e2e test.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (8)
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1)

1-25: ** Add security context and resource limits for Pod hardening.**

This Pod manifest lacks security hardening and resource management (identical to prior review on this file). While this is test infrastructure, applying basic security controls aligns with Kubernetes best practices and mitigates privilege escalation risks.

Apply this diff to add hardening:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
   containers:
     - name: lightspeed-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+      resources:
+        requests:
+          memory: "512Mi"
+          cpu: "250m"
+        limits:
+          memory: "1Gi"
+          cpu: "500m"
       env:

Adjust resource values based on actual performance during testing.

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1)

1-34: ** Add security hardening: securityContext and resource limits.**

This Pod manifests lacks the same security controls flagged in prior reviews. For a Pod serving model inference, security and resource bounds are important to prevent resource exhaustion.

Apply this diff to add hardening:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
     - name: llama-stack-container
+      securityContext:
+        allowPrivilegeEscalation: false
+      resources:
+        requests:
+          memory: "2Gi"
+          cpu: "500m"
+        limits:
+          memory: "4Gi"
+          cpu: "2"
       env:

Adjust resource values based on observed memory and CPU footprint of llama-stack during test runs.

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1)

1-65: ** Add resource limits and security context to prevent node exhaustion and privilege escalation.**

Prior reviews flagged two unresolved issues in this ServingRuntime manifest:

  1. No resources.requests/limits — vLLM can consume unbounded CPU/memory, potentially evicting other prow workloads
  2. No container securityContext — allows privilege escalation

Apply this diff to add both:

     - args:
         ...
       image: quay.io/rh-ee-cpompeia/vllm-cpu:latest
       name: kserve-container
+      securityContext:
+        allowPrivilegeEscalation: false
+      resources:
+        requests:
+          cpu: "2"
+          memory: "4Gi"
+        limits:
+          cpu: "4"
+          memory: "8Gi"
       env:

Adjust CPU/memory limits based on the 1B model's actual footprint (likely 2–4 GB) and available prow node capacity.

tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1)

1-30: ** Add security context and resource limits to harden test Pod.**

This test Pod lacks security controls and resource management, same as flagged in prior reviews. Test workloads in shared prow environments should be hardened against privilege escalation and resource exhaustion.

Apply this diff:

 spec:
+  securityContext:
+    runAsNonRoot: true
+    runAsUser: 1000
+    fsGroup: 1000
   containers:
   - name: test-container
+    securityContext:
+      allowPrivilegeEscalation: false
+      capabilities:
+        drop:
+          - ALL
+    resources:
+      requests:
+        memory: "256Mi"
+        cpu: "100m"
+      limits:
+        memory: "512Mi"
+        cpu: "500m"
     env:

Adjust resource limits based on observed test runtime requirements.

tests/e2e-prow/rhoai/pipeline.sh (3)

194-204: Validate test pod exit code extraction and handle incomplete termination state.

Line 194 extracts the exit code from the test pod, but if the pod is still running or containerStatuses is empty, the jsonpath returns an empty string. Line 198 then uses ${TEST_EXIT_CODE:-2} which treats empty as 2 (failure), potentially masking the real issue.

Apply this diff to add explicit validation:

-TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}')
+TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}' 2>/dev/null || echo "")
+
+if [[ -z "$TEST_EXIT_CODE" ]]; then
+    echo "❌ Failed to extract test pod exit code (pod may not have terminated)"
+    echo "Pod status:"
+    oc describe pod test-pod -n $NAMESPACE || true
+    oc logs test-pod -n $NAMESPACE || true
+    exit 1
+fi
 
 echo "===== E2E COMPLETE ====="
 
-if [ "${TEST_EXIT_CODE:-2}" -ne 0 ]; then
+if [[ "$TEST_EXIT_CODE" -ne 0 ]]; then
     echo "❌ E2E tests failed with exit code $TEST_EXIT_CODE (pod/test-pod failed)"
 else
     echo "✅ E2E tests succeeded"
 fi
 
 exit $TEST_EXIT_CODE

This ensures incomplete or missing exit codes are caught explicitly rather than silently treated as failures.


126-128: Validate oc wait exit code and add diagnostic output on timeout.

Line 126–128 waits for service Pods but doesn't check the exit code. If the wait times out or fails, the script continues with incomplete deployments, causing confusing downstream failures.

Apply this diff to add proper error handling:

 ./pipeline-services.sh
 
-oc wait pod/lightspeed-stack-service pod/llama-stack-service \
-    -n "$NAMESPACE" --for=condition=Ready --timeout=300s
+oc wait pod/lightspeed-stack-service pod/llama-stack-service \
+    -n "$NAMESPACE" --for=condition=Ready --timeout=300s || {
+    echo "❌ Timeout or error waiting for service pods to become Ready"
+    echo "Current pod status:"
+    oc get pods -n "$NAMESPACE" -o wide || true
+    echo "Lightspeed stack pod details:"
+    oc describe pod lightspeed-stack-service -n "$NAMESPACE" || true
+    echo "Llama stack pod details:"
+    oc describe pod llama-stack-service -n "$NAMESPACE" || true
+    exit 1
+}
+
 sleep 30

This prevents proceeding with an incomplete deployment.


68-69: Add error handling and validation for pod.env sourcing.

Line 68 sources pod.env without checking if the file exists or if sourcing succeeded. If missing, $POD_NAME and $KSVC_URL will be empty, causing cryptic failures on line 69 and later at line 84.

Apply this diff to validate:

+if [[ ! -f pod.env ]]; then
+    echo "❌ pod.env not found"
+    exit 1
+fi
+
 source pod.env || { echo "❌ Failed to source pod.env"; exit 1; }
+
+if [[ -z "$POD_NAME" ]] || [[ -z "$KSVC_URL" ]]; then
+    echo "❌ Missing required variables: POD_NAME=$POD_NAME KSVC_URL=$KSVC_URL"
+    exit 1
+fi
+
 oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s
tests/e2e-prow/rhoai/run-tests.sh (1)

1-20: Add strict error handling and environment variable validation.

The script lacks safe shell options and doesn't validate critical environment variables. If LCS_IP or LLAMA_IP are missing, errors will be cryptic.

Apply this diff to harden the script:

+#!/bin/bash
+set -euo pipefail
+trap 'echo "❌ Script failed at line $LINENO"; exit 1' ERR
+
+# Validate required environment variables
+if [[ -z "${LCS_IP:-}" ]]; then
+    echo "❌ Required environment variable not set: LCS_IP"
+    exit 1
+fi
+if [[ -z "${LLAMA_IP:-}" ]]; then
+    echo "❌ Required environment variable not set: LLAMA_IP"
+    exit 1
+fi
+
 git clone https://github.com/lightspeed-core/lightspeed-stack.git
 cd lightspeed-stack
 
 echo "pod started"
 echo $LCS_IP
 
 curl -f http://$LCS_IP:8080/v1/models || {
     echo "❌ Basic connectivity failed - showing logs before running full tests"
     exit 1
 }
 
 echo "Installing test dependencies..."
 pip install uv
-uv sync
+uv sync || { echo "❌ Failed to sync dependencies"; exit 1; }
 
 export E2E_LSC_HOSTNAME=$LCS_IP
 export E2E_LLAMA_HOSTNAME=$LLAMA_IP
 
 echo "Running comprehensive e2e test suite..."
-make test-e2e
+make test-e2e || { echo "❌ E2E tests failed"; exit 1; }
🧹 Nitpick comments (6)
tests/e2e-prow/rhoai/configs/run.yaml (1)

95-100: Telemetry service name is hardcoded; consider making it configurable.

The telemetry provider at line 96 hardcodes service_name: 'lightspeed-stack-telemetry'. For environments where this service name varies, consider using an environment variable or config override. However, this is a low-priority nice-to-have for test infrastructure.

tests/e2e-prow/rhoai/pipeline-test-pod.sh (1)

1-5: Script is correct but lacks inline error handling.

The script applies a manifest without checking the exit code. While the parent pipeline (pipeline.sh) has set -euo pipefail, adding inline error handling makes this script more robust if called independently.

Consider this optional improvement:

 #!/bin/bash
 
 BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 
-oc apply -f "$BASE_DIR/manifests/test-pod/spin-up.yaml"
+oc apply -f "$BASE_DIR/manifests/test-pod/spin-up.yaml" || {
+    echo "❌ Failed to apply test pod manifest"
+    exit 1
+}
tests/e2e-prow/rhoai/scripts/bootstrap.sh (2)

38-38: Consider error handling for informational oc get csv output.

Line 38 runs oc get csv for informational purposes but set -e will terminate the script if this command fails. Consider redirecting to stderr or wrapping with error handling to ensure debug output doesn't halt the script unexpectedly.

-oc get csv -n openshift-operators
+oc get csv -n openshift-operators 2>&1 || echo "Note: Unable to list CSVs"

5-5: Optional: Add defensive parameter validation for safety, but not critical.

Verification shows bootstrap.sh is called only from pipeline-vllm.sh line 5, which always passes a valid directory via "$PIPELINE_DIR" (derived from $(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)). The current implementation is safe with this caller.

However, adding validation remains a good defensive practice to prevent issues if the script is later called from other contexts:

 BASE_DIR="$1"
+
+if [[ -z "$BASE_DIR" ]]; then
+  echo "ERROR: BASE_DIR not provided. Usage: $0 <BASE_DIR>" >&2
+  exit 1
+fi
+
+if [[ ! -d "$BASE_DIR" ]]; then
+  echo "ERROR: BASE_DIR does not exist: $BASE_DIR" >&2
+  exit 1
+fi
tests/e2e-prow/rhoai/pipeline-services.sh (2)

25-25: Add readiness wait for lightspeed-stack, consistent with llama-stack pattern.

Line 25 applies the lightspeed-stack manifest but doesn't wait for it to be ready. This is inconsistent with the llama-stack deployment (lines 5–8) and risks downstream tests running before lightspeed-stack is available.

Add a wait after line 25:

 oc apply -f "$BASE_DIR/manifests/lightspeed/lightspeed-stack.yaml"
+
+oc wait pod/lightspeed-stack-service \
+  -n e2e-rhoai-dsc --for=condition=Ready --timeout=300s

If the pod name differs or multiple pods are created, adjust the selector accordingly. Verify the expected pod name in lightspeed-stack.yaml.


19-19: Rename LLAMA_IP to reflect content (DNS hostname, not IP) — requires coordinated updates across multiple files.

The variable contains a DNS FQDN (llama-stack-service-svc.e2e-rhoai-dsc.svc.cluster.local), not an IP address. The name LLAMA_IP is misleading.

However, renaming requires updates in multiple locations:

  • tests/e2e-prow/rhoai/pipeline-services.sh:19 — variable definition
  • tests/e2e-prow/rhoai/pipeline-services.sh:22 — secret creation
  • tests/e2e-prow/rhoai/run-tests.sh:17 — export and aliasing
  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml:10 — env var name
  • tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml:15 — env var name
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml:16 — config reference

Suggested approach: rename to LLAMA_ENDPOINT or LLAMA_HOST_URI and update all references consistently.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8998a0c and edfcec6.

📒 Files selected for processing (20)
  • README.md (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/configs/run.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-services.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/run-tests.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • docs/providers.md
  • tests/e2e-prow/rhoai/pipeline-vllm.sh
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml
  • README.md
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/run-tests.sh
  • tests/e2e-prow/rhoai/pipeline.sh
🪛 Checkov (3.2.334)
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml

[medium] 1-25: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-25: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml

[medium] 1-30: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-30: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)

Comment on lines +1 to +3
#!/bin/bash

BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add strict Bash error handling for consistency.

This script lacks set -euo pipefail, unlike bootstrap.sh. Without strict mode, unset variables and pipe failures won't halt execution, risking silent failures in orchestration.

Add after the shebang:

 #!/bin/bash
+
+set -euo pipefail
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#!/bin/bash
BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
#!/bin/bash
set -euo pipefail
BASE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/pipeline-services.sh around lines 1 to 3, the script
lacks strict Bash error handling; insert the line "set -euo pipefail"
immediately after the shebang so the script exits on errors, treats unset
variables as failures, and fails pipelines on the first failing command.

Comment on lines 21 to 23
oc create secret generic llama-stack-ip-secret \
--from-literal=key="$LLAMA_IP" \
-n e2e-rhoai-dsc || echo "Secret exists"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace overly permissive error suppression with explicit existence check.

The || echo "Secret exists" at line 23 masks all errors (permission denied, namespace not found, etc.), treating them as "already exists". This hides real failures.

Apply this diff to check explicitly:

-oc create secret generic llama-stack-ip-secret \
-    --from-literal=key="$LLAMA_IP" \
-    -n e2e-rhoai-dsc || echo "Secret exists"
+# Create secret only if it doesn't exist
+if ! oc get secret llama-stack-ip-secret -n e2e-rhoai-dsc >/dev/null 2>&1; then
+  oc create secret generic llama-stack-ip-secret \
+      --from-literal=key="$LLAMA_ENDPOINT" \
+      -n e2e-rhoai-dsc
+fi

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/pipeline-services.sh around lines 21 to 23, the current
use of "|| echo 'Secret exists'" after oc create secret masks all errors; change
this to explicitly check if the secret already exists and only suppress the
error in that case. Replace the permissive suppression with an if/else flow:
attempt to create the secret, on failure inspect the error or run "oc get secret
<name> -n <ns>" to confirm existence; if the secret exists, print a harmless
message and continue, otherwise propagate or exit with the original error code
so real failures (permissions, missing namespace) are not hidden.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1)

1-30: Security context and resource limits still missing (flagged in prior review).

This Pod still lacks the security hardening and resource management that was recommended in the previous review. The static analysis continues to flag CKV_K8S_20 and CKV_K8S_23.

If not yet addressed, apply the diff from the prior review to add securityContext (runAsNonRoot, runAsUser) and resource limits. Given the review mode is "chill," prioritize this if redeploying in production; for e2e testing in an ephemeral environment, it can be deferred.

tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1)

1-25: Security context and resource limits still missing (flagged in prior review).

This Pod lacks the recommended security hardening and resource management from the previous review. Static analysis continues to flag CKV_K8S_20 and CKV_K8S_23.

Apply the diff from the prior review to add securityContext and resource limits appropriate for the Lightspeed stack container (likely requiring more resources than the test pod). For e2e testing, this can be deferred; prioritize for production environments.

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1)

1-34: Security context and resource limits still missing (flagged in prior review).

This Pod lacks the security hardening and resource management recommended in the previous review. Static analysis flags CKV_K8S_20 and CKV_K8S_23.

Apply the diff from the prior review, adjusting resource requests/limits as needed (llama-stack likely requires more CPU/memory than the test pod—previously suggested 2Gi memory, 500m CPU requests; 4Gi memory, 2 CPU limits). For e2e testing, defer if needed; prioritize for production.

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (2)

29-29: Add resource requests and limits to prevent scheduling issues.

The container lacks resources.requests and resources.limits. On a CPU-only prow node, vLLM can consume unbounded CPU and memory, potentially evicting other pods or rendering the node unschedulable. This was flagged in prior reviews and remains unaddressed.

Apply this diff to add conservative resource constraints:

       image: quay.io/rh-ee-cpompeia/vllm-cpu:latest
       name: kserve-container
+      resources:
+        requests:
+          cpu: "2"
+          memory: "4Gi"
+        limits:
+          cpu: "4"
+          memory: "8Gi"
       env:

Adjust these values based on the 1B model's actual memory footprint (likely 2–4 GB) and available prow node capacity.


14-46: Add security hardening: securityContext for the container.

The container specification lacks a securityContext, which was flagged in prior reviews. Add security constraints to restrict privilege escalation and follow the principle of least privilege.

Apply this diff to harden the security posture:

       image: quay.io/rh-ee-cpompeia/vllm-cpu:latest
       name: kserve-container
+      securityContext:
+        allowPrivilegeEscalation: false
+        readOnlyRootFilesystem: false
+        runAsNonRoot: false
       resources:

(Note: runAsNonRoot: false is required here because vLLM requires root to bind to port 8080. If you can restrict the container to run as a non-root user and use a port ≥ 1024, set runAsNonRoot: true and add a runAsUser field.)

🧹 Nitpick comments (3)
tests/e2e-prow/rhoai/pipeline-vllm.sh (1)

1-7: Add set -euo pipefail for consistency and safety.

While this thin wrapper script may inherit error handling from the parent pipeline.sh, it should include set -euo pipefail at the top (after the shebang) for defensive robustness. If any subscript fails, the wrapper will then exit immediately rather than continuing.

 #!/bin/bash
+set -euo pipefail
 
 PIPELINE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1)

1-2: Add strict error handling at script start.

This script lacks set -euo pipefail. While the polling loops are robust, critical commands like oc rollout status and oc apply can fail silently. Add error handling:

 #!/bin/bash
+set -euo pipefail
+trap 'echo "❌ Deployment setup failed at line $LINENO"; exit 1' ERR
 
 BASE_DIR="$1"

This ensures any unhandled command failure stops the script immediately.

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1)

28-28: Consider using a pinned image digest instead of "latest".

The container image uses the latest tag (quay.io/rh-ee-cpompeia/vllm-cpu:latest), which reduces reproducibility and can lead to unexpected behavior changes between deployments. For better consistency, especially in e2e test environments, consider pinning to a specific image digest or versioned tag.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edfcec6 and 95c7a6b.

📒 Files selected for processing (20)
  • README.md (1 hunks)
  • docs/providers.md (1 hunks)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/configs/run.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml (1 hunks)
  • tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-services.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/pipeline.sh (1 hunks)
  • tests/e2e-prow/rhoai/run-tests.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (1 hunks)
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/e2e-prow/rhoai/manifests/operators/operators.yaml
🚧 Files skipped from review as they are similar to previous changes (10)
  • tests/e2e-prow/rhoai/configs/lightspeed-stack.yaml
  • docs/providers.md
  • tests/e2e-prow/rhoai/scripts/get-vllm-pod-info.sh
  • README.md
  • tests/e2e-prow/rhoai/manifests/vllm/inference-service.yaml
  • tests/e2e-prow/rhoai/run-tests.sh
  • tests/e2e-prow/rhoai/pipeline-services.sh
  • tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yaml
  • tests/e2e-prow/rhoai/pipeline-test-pod.sh
  • tests/e2e-prow/rhoai/scripts/bootstrap.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to tests/e2e/test_list.txt : Maintain the E2E test list in tests/e2e/test_list.txt

Applied to files:

  • tests/e2e-prow/rhoai/pipeline.sh
🪛 Checkov (3.2.334)
tests/e2e-prow/rhoai/manifests/lightspeed/lightspeed-stack.yaml

[medium] 1-25: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-25: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack.yaml

[medium] 1-34: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-34: Minimize the admission of root containers

(CKV_K8S_23)

tests/e2e-prow/rhoai/manifests/test-pod/spin-up.yaml

[medium] 1-30: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 1-30: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (9)
tests/e2e-prow/rhoai/pipeline.sh (4)

68-69: Validate pod.env sourcing and variable presence.

Line 68 sources pod.env without checking if the source succeeded or if $POD_NAME and $KSVC_URL are set. If pod.env is missing or malformed, subsequent commands at line 69 and 84 will fail with confusing errors.

+source pod.env || { echo "❌ Failed to source pod.env from pipeline-vllm.sh"; exit 1; }
+[[ -n "$POD_NAME" && -n "$KSVC_URL" ]] || { echo "❌ pod.env missing POD_NAME or KSVC_URL"; exit 1; }
 oc wait --for=condition=Ready pod/$POD_NAME -n $NAMESPACE --timeout=300s

Alternatively, verify that pipeline-vllm.sh unconditionally writes and exports these variables to pod.env before this script sources it.


126-128: Check exit code of service readiness wait.

The oc wait at line 126 lacks exit code validation. If it times out or fails, the script continues with incomplete deployments, causing cryptic failures downstream. Add explicit error handling:

 oc wait pod/lightspeed-stack-service pod/llama-stack-service \
-    -n "$NAMESPACE" --for=condition=Ready --timeout=300s
+    -n "$NAMESPACE" --for=condition=Ready --timeout=300s || {
+    echo "❌ Service pods did not reach Ready within 300s"
+    oc get pods -n "$NAMESPACE" -o wide || true
+    exit 1
+}
 sleep 30

168-169: Clarify and validate dual-wait pattern.

Line 168 uses an unusual pattern: wait for Ready=True, and if that times out/fails, wait for Ready=False. This is non-standard and could mask deployment issues. Clarify the intent:

  • If the goal is to ensure the pod terminates (success or failure), use a single condition or add explicit validation.
  • If testing pod failure scenarios, document this clearly.

Please verify the intent of this dual-wait pattern. If pods should reach Ready=True and the second condition is a fallback, consider restructuring for clarity:

# Example: Wait for terminal state with explicit timeout
oc wait --for=condition=Ready=True pod/test-pod -n $NAMESPACE --timeout=900s && {
    echo "✅ Pod reached Ready state"
} || {
    echo "⚠️  Pod did not reach Ready; checking if it failed..."
    oc wait --for=condition=Ready=False pod/test-pod -n $NAMESPACE --timeout=60s || {
        echo "❌ Pod stuck in intermediate state"
        exit 1
    }
}

194-204: Validate test pod exit code extraction before using it.

Line 194 extracts TEST_EXIT_CODE without checking if the pod has terminated or if containerStatuses exists. An empty or missing exit code can cause silent failures or incorrect exit logic. Add explicit validation:

-TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}')
+TEST_EXIT_CODE=$(oc get pod test-pod -n $NAMESPACE -o jsonpath='{.status.containerStatuses[0].state.terminated.exitCode}' 2>/dev/null || echo "")
+
+if [[ -z "$TEST_EXIT_CODE" ]]; then
+    echo "❌ Test pod did not terminate or exitCode is missing"
+    oc describe pod test-pod -n $NAMESPACE || true
+    oc logs test-pod -n $NAMESPACE || true
+    exit 1
+fi
 
 echo "===== E2E COMPLETE ====="
 
-if [ "${TEST_EXIT_CODE:-2}" -ne 0 ]; then
+if [[ "$TEST_EXIT_CODE" -ne 0 ]]; then
     echo "❌ E2E tests failed with exit code $TEST_EXIT_CODE (pod/test-pod failed)"
 else
     echo "✅ E2E tests succeeded"
 fi
 
 exit $TEST_EXIT_CODE
tests/e2e-prow/rhoai/scripts/deploy-vllm.sh (2)

23-23: Check exit code of oc rollout status.

The rollout status command at line 23 doesn't validate success. If it times out or the deployment fails, the script continues with an incomplete rollout, causing silent failures. Add error handling:

 echo "Waiting for kserve-controller-manager rollout..."
-oc rollout status deployment/kserve-controller-manager -n redhat-ods-applications --timeout=300s
+oc rollout status deployment/kserve-controller-manager -n redhat-ods-applications --timeout=300s || {
+    echo "❌ KServe rollout failed or timed out"
+    exit 1
+}

Alternatively, add set -euo pipefail at the script top to catch this automatically.


32-34: Add error handling to manifest application.

The oc apply commands at lines 32 and 34 lack error checking. If manifests are malformed or paths are incorrect, the commands fail silently. Add error handling:

 oc apply -f "$BASE_DIR/manifests/vllm/vllm-runtime-cpu.yaml"
+    || { echo "❌ Failed to apply vllm-runtime-cpu.yaml"; exit 1; }
 
 oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml"
+    || { echo "❌ Failed to apply inference-service.yaml"; exit 1; }

Or add set -euo pipefail at the script top to fail on any command error.

tests/e2e-prow/rhoai/manifests/operators/ds-cluster.yaml (1)

1-17: Manifest looks good.

This DataScienceCluster manifest is well-formed and properly configures KServe with service mesh. No issues identified.

tests/e2e-prow/rhoai/configs/run.yaml (1)

1-120: Configuration aligns well with deployment topology.

This llama-stack configuration correctly references the vLLM inference provider via environment variables and is well-structured for the e2e test scenario. The use of ${env.KSVC_URL} and ${env.VLLM_API_KEY} matches the secrets/ConfigMaps provisioned in pipeline.sh.

tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml (1)

30-42: Verify that referenced secrets and ConfigMap exist in the deployment pipeline.

The manifest references three Kubernetes objects that must be created before the ServingRuntime is deployed:

  • Secret hf-token-secret (line 36)
  • Secret vllm-api-key-secret (line 41)
  • ConfigMap vllm-chat-template (line 57)

If these objects are not created beforehand, the pod will fail to start or fail at runtime.

Verify that the deployment pipeline (e.g., in deploy-vllm.sh or related scripts) creates these secrets and ConfigMap before applying this manifest. If they are not yet created, add creation steps to the pipeline.

Comment on lines +14 to +27
- args:
# - /mnt/models/
- meta-llama/Llama-3.2-1B-Instruct
- --enable-auto-tool-choice
- --tool-call-parser
- llama3_json
- --chat-template
- /mnt/chat-template/tool_chat_template_llama3.2_json.jinja
- --download-dir
- /tmp/models-cache
- --port
- "8080"
- --max-model-len
- "2048"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix critical path mismatch in download-dir argument.

The container argument on line 23 specifies --download-dir /tmp/models-cache, but the volumeMount on line 51 mounts the models-cache volume to /mnt/models-cache. This path mismatch means vLLM will download models to /tmp/models-cache (inside the container's ephemeral filesystem) instead of the mounted emptyDir volume at /mnt/models-cache. Models will not persist across pod restarts.

Apply this diff to fix the path mismatch:

       - --download-dir
-      - /tmp/models-cache
+      - /mnt/models-cache
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- args:
# - /mnt/models/
- meta-llama/Llama-3.2-1B-Instruct
- --enable-auto-tool-choice
- --tool-call-parser
- llama3_json
- --chat-template
- /mnt/chat-template/tool_chat_template_llama3.2_json.jinja
- --download-dir
- /tmp/models-cache
- --port
- "8080"
- --max-model-len
- "2048"
- args:
# - /mnt/models/
- meta-llama/Llama-3.2-1B-Instruct
- --enable-auto-tool-choice
- --tool-call-parser
- llama3_json
- --chat-template
- /mnt/chat-template/tool_chat_template_llama3.2_json.jinja
- --download-dir
- /mnt/models-cache
- --port
- "8080"
- --max-model-len
- "2048"
🤖 Prompt for AI Agents
In tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-cpu.yaml around lines 14 to
27, the --download-dir argument is set to /tmp/models-cache which does not match
the volumeMount at /mnt/models-cache; change the --download-dir value to
/mnt/models-cache so vLLM writes model files into the mounted models-cache
volume (ensuring persistence across pod restarts) and verify the path matches
exactly the container volumeMount name and path.

vLLM served in Openshift, llama-stack (2.25 RHOAI) and lightspeed-stack (dev-latest) served in Openshift as independent containers. Tests run on a fourth pod.
Copy link
Contributor

@radofuchs radofuchs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@radofuchs radofuchs merged commit 4f39c33 into lightspeed-core:main Nov 5, 2025
17 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants