e2e: fix 02 test failure #195

rethil · 2025-09-30T20:39:29Z

Description

This fixes test 02 issues.
There is a race condition where pod is scaled to 0 and in parallel:

controller sets up EndpointSlices,
curl sends a http request

As EndpointSlices are not set, curl request is probably ending in istio.
With this fix, problem doesn't persist, but its covered by explicit wait for proxy mode.

There is also helm drift after 06-resolver-endpoint-sync, where resolver is scaled to 2.
Drift correction was also added.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

e2e

Checklist:

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
A PR is open to update the helm chart (link)[https://github.com/truefoundry/elasti/tree/main/charts/elasti] if applicable
I have updated the CHANGELOG.md file with the changes I made

Summary by CodeRabbit

Tests
- Simplified e2e YAML by removing explicit TestAssert blocks and fixed 60s timeouts, relying on defaults.
- Added/updated ElastiService test entries across scenarios, including mode changes between proxy and serve.
Chores
- Improved e2e test harness: time-bounded log capture (RFC3339) and more robust curl retries for reliable diagnostics.
- Minor ordering/logging adjustments and formatting cleanups.

coderabbitai · 2025-09-30T20:39:37Z

Walkthrough

Add time-bounded log collection to the e2e traffic script, remove many explicit KUTTL TestAssert blocks/timeouts from e2e YAMLs while adding/updating several ElastiService status.mode entries, and reorder an endpointslice creation step and logs in the operator controller.

Changes

Cohort / File(s)	Summary
E2E traffic script logging `tests/e2e/manifest/send_traffic.sh`	`log_failure_details` now accepts a `start_time` parameter (RFC3339); callers compute `start_time_rfc` per attempt and pass it. `kubectl logs` invocations use `--since-time`. Added `curl` retry options (`--retry 2`, `--retry-delay 1`) and a brief initial sleep; minor whitespace/format tweaks.
Serve-via-traffic tests (grouped changes) `tests/e2e/tests/02-enable-serve-via-traffic/02-assert.yaml`, `tests/e2e/tests/02-enable-serve-via-traffic/05-assert.yaml`, `tests/e2e/tests/02-enable-serve-via-traffic/07-assert.yaml`, `tests/e2e/tests/02-enable-serve-via-traffic/10-assert.yaml`, `tests/e2e/tests/02-enable-serve-via-traffic/12-assert.yaml`, `tests/e2e/tests/02-enable-serve-via-traffic/15-assert.yaml`, `tests/e2e/tests/02-enable-serve-via-traffic/17-assert.yaml`, `tests/e2e/tests/02-enable-serve-via-traffic/20-assert.yaml`	Removed explicit KUTTL `TestAssert` blocks (apiVersion/kind/timeout) in multiple steps. Added/updated `ElastiService` resource entries (e.g., `metadata.name: target-elastiservice`) and adjusted `status.mode` values (proxy/serve) across steps.
Other e2e TestAssert removals `tests/e2e/tests/03-enable-serve-via-manual/03-assert.yaml`, `tests/e2e/tests/07-scale-down-with-filter/05-assert.yaml`, `tests/e2e/tests/08-resolver-multiple-endpoints/06-assert-mode-proxy.yaml`, `tests/e2e/tests/08-resolver-multiple-endpoints/08-assert-mode-serve.yaml`, `tests/e2e/tests/09-statefulset-serve-via-traffic/02-assert.yaml`, `tests/e2e/tests/09-statefulset-serve-via-traffic/05-assert.yaml`	Removed `TestAssert` apiVersion/kind and `timeout` fields; retained `ElastiService` definitions and mode assertions where present.
Operator controller ordering `operator/internal/controller/opsModes.go`	In `enableProxyMode`, moved `createOrUpdateEndpointsliceToResolver(ctx, targetSVC)` earlier (before `watchPublicService`) and adjusted log numbering; added a TODO regarding atomicity.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester
  participant Script as send_traffic.sh
  participant Kubectl as kubectl
  participant Service as ElastiService
  participant Ingress
  participant Resolver
  participant Controller
  participant TargetPod

  Tester->>Script: run
  Script->>Script: start_time_rfc = now (RFC3339)
  loop retries
    Script->>Service: curl (with --retry, --retry-delay)
    alt success
      Script-->>Tester: done (0)
    else failure
      Script->>Script: log_failure_details(start_time_rfc)
      Script->>Kubectl: logs Ingress --since-time=start_time_rfc
      Script->>Kubectl: logs Resolver --since-time=start_time_rfc
      Script->>Kubectl: logs Controller --since-time=start_time_rfc
      Script->>Kubectl: logs TargetPod --since-time=start_time_rfc
      Script-->>Tester: fail + collected logs
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my whiskers, mark the time,
RFC stamps chase the failing rhyme.
As asserts fall and modes turn sly,
Endpoints hop and logs reply.
A bunny cheers: tests pass—bye-bye! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly highlights that the pull request addresses a failure in the e2e test #2, which aligns with the main objective of fixing the race condition in that test. It avoids extraneous detail and noise, focusing reader attention on the key change. This level of specificity is sufficient for a teammate to understand the general intent when scanning the PR history.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a34ded and c307f74.

📒 Files selected for processing (1)

tests/e2e/manifest/send_traffic.sh (5 hunks)

🧰 Additional context used

🪛 Shellcheck (0.11.0)

tests/e2e/manifest/send_traffic.sh

[info] 126-126: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 130-130: echo may not expand escape sequences. Use printf.

(SC2028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: golangci-lint (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
GitHub Check: Analyze (go)
GitHub Check: Run Kuttl E2E Tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67f78f2 and c061380.

📒 Files selected for processing (2)

tests/e2e/manifest/send_traffic.sh (5 hunks)
tests/e2e/manifest/setup.sh (2 hunks)

🧰 Additional context used

🪛 Shellcheck (0.11.0)

tests/e2e/manifest/setup.sh

[info] 50-50: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 55-55: Double quote to prevent globbing and word splitting.

(SC2086)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: Run Kuttl E2E Tests
GitHub Check: Analyze (go)

tests/e2e/manifest/send_traffic.sh

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/e2e/manifest/send_traffic.sh (1)
78-78: Critical: Invalid --all-pods=true flag still present.

Despite the past review marking this as "✅ Addressed in commit 9243d71", the unsupported --all-pods=true flag remains on line 78. This flag causes kubectl logs to fail with "unknown flag: --all-pods", breaking failure diagnostics.

Apply this diff to remove the invalid flag:
-    kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" --all-pods=true | sed 's/^/    /' || echo "${YELLOW}    - Could not retrieve resolver logs${NC}"
+    kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" | sed 's/^/    /' || echo "${YELLOW}    - Could not retrieve resolver logs${NC}"
Note: If you need logs from all pods behind the service, use --all-containers=true (which is valid) or iterate over pods with a label selector.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9243d71 and d8d4e40.

📒 Files selected for processing (1)

tests/e2e/manifest/send_traffic.sh (6 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
GitHub Check: golangci-lint (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: Run Kuttl E2E Tests
GitHub Check: Analyze (go)

🔇 Additional comments (4)

tests/e2e/manifest/send_traffic.sh (4)

63-65: LGTM: Parameterized time-bounded log retrieval.

The addition of the start_time parameter enables focused log retrieval from the relevant time window, improving diagnostic efficiency.

159-159: LGTM: Consistent time-bounded failure diagnostics.

Both failure paths correctly pass start_time_rfc to log_failure_details, ensuring diagnostic logs are scoped to the relevant time window.

Also applies to: 176-176

91-91: LGTM: Formatting improvements.

The additional blank lines improve readability and logical separation of code blocks.

Also applies to: 141-141, 145-145, 163-163, 182-182

78-78: Verify kubectl logs with services/ prefix
Lines 78 and 82 invoke kubectl logs -n elasti services/<service>; the sandbox couldn’t confirm its behavior. Please manually test in a real cluster to ensure this returns the expected logs for all pods behind the service. If it fails or only shows one pod, update to target pods or deployments directly (for example, kubectl logs -n elasti <pod-name> or kubectl logs -n elasti deployment/<deployment-name>).

tests/e2e/manifest/send_traffic.sh

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/e2e/manifest/send_traffic.sh (2)
43-51: Fix unbound color variables under set -u (script exits early).

Colors are used before they are defined; with set -u, ${RED}/${NC} are unbound and abort the script on error paths (Lines 44 and 49). Move color definitions above any usage, or use safe expansion.

Apply this minimal diff to move color definitions:
@@
-set -u
+set -u
+
+# --- Color Definitions ---
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+CYAN='\033[0;36m'
+NC='\033[0m' # No Color
@@
-# --- Color Definitions ---
-RED='\033[0;31m'
-GREEN='\033[0;32m'
-YELLOW='\033[1;33m'
-CYAN='\033[0;36m'
-NC='\033[0m' # No Color
Optionally also guard all expansions as ${RED:-}/${NC:-} on early‑exit paths.

Also applies to: 53-58

126-126: Non-portable newline with echo; use printf.

echo "\n..." is not portable across POSIX shells. Prefer printf.
-    echo "\n${CYAN}--- Request $i/$MAX_RETRIES ---${NC}"
+    printf "\n%s--- Request %s/%s ---%s\n" "${CYAN}" "$i" "$MAX_RETRIES" "${NC}"
@@
-echo "\n${CYAN}=== Test Summary ===${NC}"
+printf "\n%s=== Test Summary ===%s\n" "${CYAN}" "${NC}"
Also applies to: 190-190

♻️ Duplicate comments (2)

tests/e2e/manifest/send_traffic.sh (2)
125-131: GNU-specific date breaks portability; avoid -d/--rfc-3339.

date -d "-1 minute" --rfc-3339=seconds is GNU‑only and will fail on macOS/BSD/BusyBox. Use a POSIX‑friendly UTC timestamp, or better, drop --since-time in favor of --since=<duration> everywhere.

Minimal portable fix (keep --since-time usage):
-    start_time_rfc=$(date -d "-1 minute" --rfc-3339=seconds | sed 's/ /T/')
+    start_time_rfc=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
Recommended refactor (avoid date entirely): use "${since_args[@]}" from the earlier comment and remove start_time_rfc plumbing.

77-83: kubectl logs: unsupported services/ target and invalid flag; switch to selector-based logs.*

--all-pods=true is not a valid flag for kubectl logs.

services/<name> is not a supported logs target; use a label selector (from the Service’s selector) or a workload resource (e.g., deploy/<name>).

Apply this robust, zero‑assumption fix using the service selector (also uses the since_args fallback from above):
-    kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" --all-pods=true | sed 's/^/    /' || echo "${YELLOW}    - Could not retrieve resolver logs${NC}"
+    svc_selector=$(kubectl get svc -n elasti elasti-resolver-service -o jsonpath='{range $k,$v := .spec.selector}{printf "%s=%s,", $k, $v}{end}' | sed 's/,$//')
+    if [ -n "$svc_selector" ]; then
+      kubectl logs -n elasti --selector="$svc_selector" --all-containers=true "${since_args[@]}" | sed 's/^/    /' \
+        || echo "${YELLOW}    - Could not retrieve resolver logs${NC}"
+    else
+      echo "${YELLOW}    - Service has no selector; skipping resolver logs${NC}"
+    fi
@@
-    kubectl logs -n elasti services/elasti-operator-controller-service --since-time="${start_time}" | sed 's/^/    /' || echo "${YELLOW}    - Could not retrieve controller logs${NC}"
+    ctrl_selector=$(kubectl get svc -n elasti elasti-operator-controller-service -o jsonpath='{range $k,$v := .spec.selector}{printf "%s=%s,", $k, $v}{end}' | sed 's/,$//')
+    if [ -n "$ctrl_selector" ]; then
+      kubectl logs -n elasti --selector="$ctrl_selector" --all-containers=true "${since_args[@]}" | sed 's/^/    /' \
+        || echo "${YELLOW}    - Could not retrieve controller logs${NC}"
+    else
+      echo "${YELLOW}    - Controller service has no selector; skipping controller logs${NC}"
+    fi
If your kubectl is too old for --selector, iterate pods: for p in $(kubectl get pods -l "$svc_selector" -o name); do kubectl logs "$p" ...; done.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8d4e40 and abf9531.

📒 Files selected for processing (1)

tests/e2e/manifest/send_traffic.sh (6 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
GitHub Check: Run Kuttl E2E Tests
GitHub Check: Analyze (go)

🔇 Additional comments (1)

tests/e2e/manifest/send_traffic.sh (1)

85-87: Ensure target resource supports kubectl logs.

${TARGET_RESOURCE}/${TARGET_NAME} must be a log‑capable resource (Pod/Deployment/StatefulSet/Job/DaemonSet). If it can be a Service, this will fail. Consider the same selector‑based approach or pass the concrete workload type.

Do the e2e invocations always provide a log‑capable resource here?

tests/e2e/manifest/send_traffic.sh

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

tests/e2e/manifest/send_traffic.sh (3)
74-74: Incorrect error label for istiod logs.

The error message says "resolver logs" when fetching istiod logs from the ingress namespace.

Apply this diff:
-    kubectl logs -n istio-system deployments/istiod --since-time="${start_time}" | sed 's/^/    /' || echo "${YELLOW}    - Could not retrieve resolver logs${NC}"
+    kubectl logs -n istio-system deployments/istiod --since-time="${start_time}" | sed 's/^/    /' || echo "${YELLOW}    - Could not retrieve istiod logs${NC}"
63-64: Add defensive handling for empty or malformed start_time.

If start_time is empty or malformed, kubectl logs --since-time will fail silently or produce errors. A fallback to a duration-based window (e.g., --since=90s) would make diagnostics more resilient.

Consider this pattern:
 log_failure_details() {
-    start_time="${1}"
+    start_time="${1:-}"
+    if [ -n "${start_time}" ]; then
+        since_args="--since-time=${start_time}"
+    else
+        since_args="--since=90s"
+    fi
Then replace direct --since-time="${start_time}" uses with ${since_args} in subsequent kubectl commands.

132-132: Portability issue: GNU-specific date command.

The date --rfc-3339=seconds option is GNU-specific and will fail on systems without GNU coreutils (e.g., Alpine, macOS, BSD). Since the script uses #!/bin/sh, it should use POSIX-compatible commands.

Apply this diff to use a POSIX-compatible alternative:
-    start_time_rfc=$(date -d "-1 minute" --rfc-3339=seconds | sed 's/ /T/')
+    start_time_rfc=$(date -u -v-1M +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u --date="-1 minute" +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u +"%Y-%m-%dT%H:%M:%SZ")
This tries BSD-style (-v-1M), then GNU-style (--date), then falls back to current time, ensuring cross-platform compatibility.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abf9531 and f4b8501.

📒 Files selected for processing (2)

operator/internal/controller/opsEndpointslices.go (1 hunks)
tests/e2e/manifest/send_traffic.sh (6 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-14T12:33:30.800Z

Learnt from: rethil
PR: truefoundry/KubeElasti#178
File: operator/internal/controller/opsEndpointslices.go:21-23
Timestamp: 2025-09-14T12:33:30.800Z
Learning: In the KubeElasti project, the resolver component in `getIPsForResolver()` method must list EndpointSlices across all namespaces, not restricted to its own namespace. This is by design and required for proper resolver functionality.

Applied to files:

operator/internal/controller/opsEndpointslices.go

🪛 GitHub Check: golangci-lint (/home/runner/work/KubeElasti/KubeElasti/operator)

operator/internal/controller/opsEndpointslices.go

[failure] 112-112:
unnecessary conversion (unconvert)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
GitHub Check: Run Kuttl E2E Tests
GitHub Check: Analyze (go)

🔇 Additional comments (1)

tests/e2e/manifest/send_traffic.sh (1)

160-160: Manual verification required for timestamp portability and kubectl compatibility.
The Linux sandbox confirms the POSIX fallback yields a valid RFC3339 timestamp, but we can’t validate the BSD/macOS date -v-1M syntax here or ensure kubectl logs --since-time accepts it. Please test on macOS (e.g. date -u -v-1M '+%Y-%m-%dT%H:%M:%SZ') and verify that kubectl logs --since-time=<timestamp> successfully retrieves logs.

operator/internal/controller/opsEndpointslices.go

tests/e2e/manifest/send_traffic.sh

There is a race condition between curl & elasti operator. When elasti sets up EndpointSlices, curl is already doing it request, which results in curl failure. Explicit wait for proxy mode solves this issue. Test 06 is introducing helm release drift, with resolver being scaled to 2. Helm drift fixup was added.

rethil · 2025-10-01T07:57:23Z

Hey @ramantehlan, this should fix that 02 test issues.
Could you re-run this few times and merge? This would unblock #186

ramantehlan · 2025-10-01T08:42:33Z

Hi @rethil, was just debugging this issue, thank you for the PR.
I will have a look.

tests/e2e/manifest/setup.sh

ramantehlan · 2025-10-01T15:05:18Z

@rethil, we have a community call tomorrow, would you be able to join?
We are applying to CNCF this month, and are presenting to TAGs next month.

rethil · 2025-10-02T05:58:48Z

@rethil, we have a community call tomorrow, would you be able to join?
We are applying to CNCF this month, and are presenting to TAGs next month.

Hey, I won't make it. Good luck!

ramantehlan · 2025-10-02T14:28:40Z

Seems like it is failing again.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between adddeb5 and 33fe9de.

📒 Files selected for processing (1)

tests/e2e/manifest/setup.sh (2 hunks)

🧰 Additional context used

🪛 Shellcheck (0.11.0)

tests/e2e/manifest/setup.sh

[info] 50-50: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 55-55: Double quote to prevent globbing and word splitting.

(SC2086)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
GitHub Check: Analyze (go)
GitHub Check: Run Kuttl E2E Tests

tests/e2e/manifest/setup.sh

ramantehlan · 2025-10-03T11:34:07Z

Surprisingly, all tests pass locally.

rethil · 2025-10-03T12:33:21Z

Surprisingly, all tests pass locally.

I'll try do debug this during weekend.
It's odd that there isn't anything at all from resolver. Just like that curl request didn't even hit it.

I really suspect that EndpointSlices are not yet in place after scaling pod to 0. Like CRD status was changed to "proxy", but slices are not there yet.

rethil · 2025-10-03T12:34:37Z

Assert in step 02 wasn't executed:

     logger.go:42: 16:37:20 | 02-enable-serve-via-traffic/2- | starting test step 2-
    logger.go:42: 16:37:20 | 02-enable-serve-via-traffic/2- | test step completed 2-

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/e2e/manifest/send_traffic.sh (1)
78-78: Remove unsupported kubectl logs --all-pods flag.

kubectl logs still rejects --all-pods=true, so this call fails before collecting resolver logs—exactly the regression we hit previously. Drop the flag or iterate pods via a selector.
-    kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" --all-pods=true | sed 's/^/    /' || echo "${YELLOW}    - Could not retrieve resolver logs${NC}"
+    kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" | sed 's/^/    /' || echo "${YELLOW}    - Could not retrieve resolver logs${NC}"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33fe9de and 5dd0cd9.

📒 Files selected for processing (2)

tests/e2e/manifest/send_traffic.sh (4 hunks)
tests/e2e/manifest/setup.sh (2 hunks)

🧰 Additional context used

🪛 Shellcheck (0.11.0)

tests/e2e/manifest/setup.sh

[info] 50-50: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 55-55: Double quote to prevent globbing and word splitting.

(SC2086)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
GitHub Check: Run Kuttl E2E Tests
GitHub Check: Analyze (go)

tests/e2e/manifest/setup.sh

ramantehlan · 2025-10-03T12:37:21Z

Assert in step 02 wasn't executed:

     logger.go:42: 16:37:20 | 02-enable-serve-via-traffic/2- | starting test step 2-
    logger.go:42: 16:37:20 | 02-enable-serve-via-traffic/2- | test step completed 2-

Hmm, in that case it must be running tests on cached code.

ramantehlan · 2025-10-03T12:39:32Z

Surprisingly, all tests pass locally.

I'll try do debug this during weekend. It's odd that there isn't anything at all from resolver. Just like that curl request didn't even hit it.

I really suspect that EndpointSlices are not yet in place after scaling pod to 0. Like CRD status was changed to "proxy", but slices are not there yet.

We set mode once after all the steps are fulfilled, but I will cross check if we are waiting for EndpointSlices or not.

ramantehlan · 2025-10-03T13:05:21Z

Right now, we are not waiting for EndpointSliceChanges to take effect:

if isResolverSliceFound {
		if err := r.Update(ctx, newEndpointSlice); err != nil {
			r.Logger.Error("failed to update sliceToResolver", zap.String("endpointslice", EndpointsliceNamespacedName.String()), zap.Error(err))
			return fmt.Errorf("createOrUpdateEndpointsliceToResolver: %w", err)
		}
		r.Logger.Info("EndpointSlice updated successfully", zap.String("endpointslice", EndpointsliceNamespacedName.String()))
	}

Here is more context:

When you call r.Update(ctx, obj), it sends the update request to the Kubernetes API server. The function will only return a nil error after the API server has successfully validated the request, persisted the change to its underlying storage (etcd), and sent back a success confirmation.

…5 second sleep on first test

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/e2e/manifest/send_traffic.sh (1)

78-78: Restore kubectl compatibility by dropping --all-pods.

kubectl logs still rejects the --all-pods=true flag, so this command exits with “unknown flag: --all-pods” and the failure diagnostics never run. Please remove the unsupported flag (or loop over pods via kubectl get) to keep log collection working.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1305f4 and 4a34ded.

📒 Files selected for processing (2)

operator/internal/controller/opsModes.go (2 hunks)
tests/e2e/manifest/send_traffic.sh (5 hunks)

🧰 Additional context used

🪛 Shellcheck (0.11.0)

tests/e2e/manifest/send_traffic.sh

[info] 126-126: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 130-130: echo may not expand escape sequences. Use printf.

(SC2028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
GitHub Check: Analyze (go)
GitHub Check: Run Kuttl E2E Tests

🔇 Additional comments (2)

operator/internal/controller/opsModes.go (2)

77-77: LGTM!

Log messages correctly reflect the reordered operations with accurate sequential numbering.

Also applies to: 83-83

74-77: Verify EndpointSlice readiness after creation. No direct EndpointSlice List/Get calls were found in watchPublicService; ensure it relies on an informer cache to pick up the create/update event or add an explicit poll/wait after Line 77 to eliminate the race.

operator/internal/controller/opsModes.go

* e2e: fix 02 test failure There is a race condition between curl & elasti operator. When elasti sets up EndpointSlices, curl is already doing it request, which results in curl failure. Explicit wait for proxy mode solves this issue. Test 06 is introducing helm release drift, with resolver being scaled to 2. Helm drift fixup was added. * setup: don't correct helm drift * e2e assert: it can be only action in file * feat: move endpointslice creation earlier, add retry to curl, with a 5 second sleep on first test * test: retest --------- Co-authored-by: Raman Tehlan <[email protected]>

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

tests/e2e/manifest/send_traffic.sh Show resolved Hide resolved

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

tests/e2e/manifest/send_traffic.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

tests/e2e/manifest/send_traffic.sh Show resolved Hide resolved

tests/e2e/manifest/send_traffic.sh Outdated Show resolved Hide resolved

tests/e2e/manifest/send_traffic.sh Show resolved Hide resolved

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

operator/internal/controller/opsEndpointslices.go Outdated Show resolved Hide resolved

tests/e2e/manifest/send_traffic.sh Outdated Show resolved Hide resolved

rethil force-pushed the fix_e2e_02 branch from 9657262 to 5e1609c Compare October 1, 2025 07:42

ramantehlan requested changes Oct 1, 2025

View reviewed changes

tests/e2e/manifest/setup.sh Outdated Show resolved Hide resolved

setup: don't correct helm drift

adddeb5

coderabbitai bot reviewed Oct 2, 2025

View reviewed changes

tests/e2e/manifest/setup.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

tests/e2e/manifest/setup.sh Outdated Show resolved Hide resolved

e2e assert: it can be only action in file

e1305f4

rethil force-pushed the fix_e2e_02 branch from 5dd0cd9 to e1305f4 Compare October 3, 2025 12:39

feat: move endpointslice creation earlier, add retry to curl, with a …

4a34ded

…5 second sleep on first test

coderabbitai bot reviewed Oct 3, 2025

View reviewed changes

operator/internal/controller/opsModes.go Show resolved Hide resolved

test: retest

c307f74

ramantehlan approved these changes Oct 3, 2025

View reviewed changes

ramantehlan merged commit 541373e into truefoundry:main Oct 3, 2025
11 checks passed

e2e: fix 02 test failure #195

e2e: fix 02 test failure #195

Uh oh!

Conversation

rethil commented Sep 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Checklist:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rethil commented Oct 1, 2025

Uh oh!

ramantehlan commented Oct 1, 2025

Uh oh!

Uh oh!

ramantehlan commented Oct 1, 2025

Uh oh!

rethil commented Oct 2, 2025

Uh oh!

ramantehlan commented Oct 2, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ramantehlan commented Oct 3, 2025

Uh oh!

rethil commented Oct 3, 2025

Uh oh!

rethil commented Oct 3, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ramantehlan commented Oct 3, 2025

Uh oh!

ramantehlan commented Oct 3, 2025

Uh oh!

ramantehlan commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rethil commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 30, 2025 •

edited

Loading

ramantehlan commented Oct 3, 2025 •

edited

Loading