-
Notifications
You must be signed in to change notification settings - Fork 4
e2e: fix 02 test failure #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdd time-bounded log collection to the e2e traffic script, remove many explicit KUTTL TestAssert blocks/timeouts from e2e YAMLs while adding/updating several ElastiService status.mode entries, and reorder an endpointslice creation step and logs in the operator controller. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Tester
participant Script as send_traffic.sh
participant Kubectl as kubectl
participant Service as ElastiService
participant Ingress
participant Resolver
participant Controller
participant TargetPod
Tester->>Script: run
Script->>Script: start_time_rfc = now (RFC3339)
loop retries
Script->>Service: curl (with --retry, --retry-delay)
alt success
Script-->>Tester: done (0)
else failure
Script->>Script: log_failure_details(start_time_rfc)
Script->>Kubectl: logs Ingress --since-time=start_time_rfc
Script->>Kubectl: logs Resolver --since-time=start_time_rfc
Script->>Kubectl: logs Controller --since-time=start_time_rfc
Script->>Kubectl: logs TargetPod --since-time=start_time_rfc
Script-->>Tester: fail + collected logs
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🪛 Shellcheck (0.11.0)tests/e2e/manifest/send_traffic.sh[info] 126-126: Double quote to prevent globbing and word splitting. (SC2086) [info] 130-130: echo may not expand escape sequences. Use printf. (SC2028) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
tests/e2e/manifest/send_traffic.sh
(5 hunks)tests/e2e/manifest/setup.sh
(2 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
tests/e2e/manifest/setup.sh
[info] 50-50: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 55-55: Double quote to prevent globbing and word splitting.
(SC2086)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
- GitHub Check: Run Kuttl E2E Tests
- GitHub Check: Analyze (go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/e2e/manifest/send_traffic.sh (1)
78-78
: Critical: Invalid --all-pods=true flag still present.Despite the past review marking this as "✅ Addressed in commit 9243d71", the unsupported
--all-pods=true
flag remains on line 78. This flag causeskubectl logs
to fail with "unknown flag: --all-pods", breaking failure diagnostics.Apply this diff to remove the invalid flag:
- kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" --all-pods=true | sed 's/^/ /' || echo "${YELLOW} - Could not retrieve resolver logs${NC}" + kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" | sed 's/^/ /' || echo "${YELLOW} - Could not retrieve resolver logs${NC}"Note: If you need logs from all pods behind the service, use
--all-containers=true
(which is valid) or iterate over pods with a label selector.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tests/e2e/manifest/send_traffic.sh
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
- GitHub Check: golangci-lint (/home/runner/work/KubeElasti/KubeElasti/operator)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
- GitHub Check: Run Kuttl E2E Tests
- GitHub Check: Analyze (go)
🔇 Additional comments (4)
tests/e2e/manifest/send_traffic.sh (4)
63-65
: LGTM: Parameterized time-bounded log retrieval.The addition of the
start_time
parameter enables focused log retrieval from the relevant time window, improving diagnostic efficiency.
159-159
: LGTM: Consistent time-bounded failure diagnostics.Both failure paths correctly pass
start_time_rfc
tolog_failure_details
, ensuring diagnostic logs are scoped to the relevant time window.Also applies to: 176-176
91-91
: LGTM: Formatting improvements.The additional blank lines improve readability and logical separation of code blocks.
Also applies to: 141-141, 145-145, 163-163, 182-182
78-78
: Verify kubectl logs with services/ prefix
Lines 78 and 82 invokekubectl logs -n elasti services/<service>
; the sandbox couldn’t confirm its behavior. Please manually test in a real cluster to ensure this returns the expected logs for all pods behind the service. If it fails or only shows one pod, update to target pods or deployments directly (for example,kubectl logs -n elasti <pod-name>
orkubectl logs -n elasti deployment/<deployment-name>
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/e2e/manifest/send_traffic.sh (2)
43-51
: Fix unbound color variables under set -u (script exits early).Colors are used before they are defined; with
set -u
,${RED}
/${NC}
are unbound and abort the script on error paths (Lines 44 and 49). Move color definitions above any usage, or use safe expansion.Apply this minimal diff to move color definitions:
@@ -set -u +set -u + +# --- Color Definitions --- +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +CYAN='\033[0;36m' +NC='\033[0m' # No Color @@ -# --- Color Definitions --- -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -CYAN='\033[0;36m' -NC='\033[0m' # No ColorOptionally also guard all expansions as
${RED:-}
/${NC:-}
on early‑exit paths.Also applies to: 53-58
126-126
: Non-portable newline with echo; use printf.
echo "\n..."
is not portable across POSIX shells. Preferprintf
.- echo "\n${CYAN}--- Request $i/$MAX_RETRIES ---${NC}" + printf "\n%s--- Request %s/%s ---%s\n" "${CYAN}" "$i" "$MAX_RETRIES" "${NC}" @@ -echo "\n${CYAN}=== Test Summary ===${NC}" +printf "\n%s=== Test Summary ===%s\n" "${CYAN}" "${NC}"Also applies to: 190-190
♻️ Duplicate comments (2)
tests/e2e/manifest/send_traffic.sh (2)
125-131
: GNU-specific date breaks portability; avoid -d/--rfc-3339.
date -d "-1 minute" --rfc-3339=seconds
is GNU‑only and will fail on macOS/BSD/BusyBox. Use a POSIX‑friendly UTC timestamp, or better, drop--since-time
in favor of--since=<duration>
everywhere.Minimal portable fix (keep --since-time usage):
- start_time_rfc=$(date -d "-1 minute" --rfc-3339=seconds | sed 's/ /T/') + start_time_rfc=$(date -u +"%Y-%m-%dT%H:%M:%SZ")Recommended refactor (avoid date entirely): use
"${since_args[@]}"
from the earlier comment and removestart_time_rfc
plumbing.
77-83
: kubectl logs: unsupported services/ target and invalid flag; switch to selector-based logs.*
--all-pods=true
is not a valid flag forkubectl logs
.services/<name>
is not a supported logs target; use a label selector (from the Service’s selector) or a workload resource (e.g.,deploy/<name>
).Apply this robust, zero‑assumption fix using the service selector (also uses the
since_args
fallback from above):- kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" --all-pods=true | sed 's/^/ /' || echo "${YELLOW} - Could not retrieve resolver logs${NC}" + svc_selector=$(kubectl get svc -n elasti elasti-resolver-service -o jsonpath='{range $k,$v := .spec.selector}{printf "%s=%s,", $k, $v}{end}' | sed 's/,$//') + if [ -n "$svc_selector" ]; then + kubectl logs -n elasti --selector="$svc_selector" --all-containers=true "${since_args[@]}" | sed 's/^/ /' \ + || echo "${YELLOW} - Could not retrieve resolver logs${NC}" + else + echo "${YELLOW} - Service has no selector; skipping resolver logs${NC}" + fi @@ - kubectl logs -n elasti services/elasti-operator-controller-service --since-time="${start_time}" | sed 's/^/ /' || echo "${YELLOW} - Could not retrieve controller logs${NC}" + ctrl_selector=$(kubectl get svc -n elasti elasti-operator-controller-service -o jsonpath='{range $k,$v := .spec.selector}{printf "%s=%s,", $k, $v}{end}' | sed 's/,$//') + if [ -n "$ctrl_selector" ]; then + kubectl logs -n elasti --selector="$ctrl_selector" --all-containers=true "${since_args[@]}" | sed 's/^/ /' \ + || echo "${YELLOW} - Could not retrieve controller logs${NC}" + else + echo "${YELLOW} - Controller service has no selector; skipping controller logs${NC}" + fiIf your kubectl is too old for
--selector
, iterate pods:for p in $(kubectl get pods -l "$svc_selector" -o name); do kubectl logs "$p" ...; done
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tests/e2e/manifest/send_traffic.sh
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
- GitHub Check: Run Kuttl E2E Tests
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
tests/e2e/manifest/send_traffic.sh (1)
85-87
: Ensure target resource supports kubectl logs.
${TARGET_RESOURCE}/${TARGET_NAME}
must be a log‑capable resource (Pod/Deployment/StatefulSet/Job/DaemonSet). If it can be a Service, this will fail. Consider the same selector‑based approach or pass the concrete workload type.Do the e2e invocations always provide a log‑capable resource here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
tests/e2e/manifest/send_traffic.sh (3)
74-74
: Incorrect error label for istiod logs.The error message says "resolver logs" when fetching
istiod
logs from the ingress namespace.Apply this diff:
- kubectl logs -n istio-system deployments/istiod --since-time="${start_time}" | sed 's/^/ /' || echo "${YELLOW} - Could not retrieve resolver logs${NC}" + kubectl logs -n istio-system deployments/istiod --since-time="${start_time}" | sed 's/^/ /' || echo "${YELLOW} - Could not retrieve istiod logs${NC}"
63-64
: Add defensive handling for empty or malformed start_time.If
start_time
is empty or malformed,kubectl logs --since-time
will fail silently or produce errors. A fallback to a duration-based window (e.g.,--since=90s
) would make diagnostics more resilient.Consider this pattern:
log_failure_details() { - start_time="${1}" + start_time="${1:-}" + if [ -n "${start_time}" ]; then + since_args="--since-time=${start_time}" + else + since_args="--since=90s" + fiThen replace direct
--since-time="${start_time}"
uses with${since_args}
in subsequent kubectl commands.
132-132
: Portability issue: GNU-specific date command.The
date --rfc-3339=seconds
option is GNU-specific and will fail on systems without GNU coreutils (e.g., Alpine, macOS, BSD). Since the script uses#!/bin/sh
, it should use POSIX-compatible commands.Apply this diff to use a POSIX-compatible alternative:
- start_time_rfc=$(date -d "-1 minute" --rfc-3339=seconds | sed 's/ /T/') + start_time_rfc=$(date -u -v-1M +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u --date="-1 minute" +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date -u +"%Y-%m-%dT%H:%M:%SZ")This tries BSD-style (
-v-1M
), then GNU-style (--date
), then falls back to current time, ensuring cross-platform compatibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
operator/internal/controller/opsEndpointslices.go
(1 hunks)tests/e2e/manifest/send_traffic.sh
(6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-14T12:33:30.800Z
Learnt from: rethil
PR: truefoundry/KubeElasti#178
File: operator/internal/controller/opsEndpointslices.go:21-23
Timestamp: 2025-09-14T12:33:30.800Z
Learning: In the KubeElasti project, the resolver component in `getIPsForResolver()` method must list EndpointSlices across all namespaces, not restricted to its own namespace. This is by design and required for proper resolver functionality.
Applied to files:
operator/internal/controller/opsEndpointslices.go
🪛 GitHub Check: golangci-lint (/home/runner/work/KubeElasti/KubeElasti/operator)
operator/internal/controller/opsEndpointslices.go
[failure] 112-112:
unnecessary conversion (unconvert)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
- GitHub Check: Run Kuttl E2E Tests
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
tests/e2e/manifest/send_traffic.sh (1)
160-160
: Manual verification required for timestamp portability and kubectl compatibility.
The Linux sandbox confirms the POSIX fallback yields a valid RFC3339 timestamp, but we can’t validate the BSD/macOSdate -v-1M
syntax here or ensurekubectl logs --since-time
accepts it. Please test on macOS (e.g.date -u -v-1M '+%Y-%m-%dT%H:%M:%SZ'
) and verify thatkubectl logs --since-time=<timestamp>
successfully retrieves logs.
There is a race condition between curl & elasti operator. When elasti sets up EndpointSlices, curl is already doing it request, which results in curl failure. Explicit wait for proxy mode solves this issue. Test 06 is introducing helm release drift, with resolver being scaled to 2. Helm drift fixup was added.
Hey @ramantehlan, this should fix that 02 test issues. |
Hi @rethil, was just debugging this issue, thank you for the PR. |
@rethil, we have a community call tomorrow, would you be able to join? |
Hey, I won't make it. Good luck! |
Seems like it is failing again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tests/e2e/manifest/setup.sh
(2 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
tests/e2e/manifest/setup.sh
[info] 50-50: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 55-55: Double quote to prevent globbing and word splitting.
(SC2086)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
- GitHub Check: Analyze (go)
- GitHub Check: Run Kuttl E2E Tests
Surprisingly, all tests pass locally. |
I'll try do debug this during weekend. I really suspect that EndpointSlices are not yet in place after scaling pod to 0. Like CRD status was changed to "proxy", but slices are not there yet. |
Assert in step 02 wasn't executed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/e2e/manifest/send_traffic.sh (1)
78-78
: Remove unsupportedkubectl logs --all-pods
flag.
kubectl logs
still rejects--all-pods=true
, so this call fails before collecting resolver logs—exactly the regression we hit previously. Drop the flag or iterate pods via a selector.- kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" --all-pods=true | sed 's/^/ /' || echo "${YELLOW} - Could not retrieve resolver logs${NC}" + kubectl logs -n elasti services/elasti-resolver-service --since-time="${start_time}" | sed 's/^/ /' || echo "${YELLOW} - Could not retrieve resolver logs${NC}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
tests/e2e/manifest/send_traffic.sh
(4 hunks)tests/e2e/manifest/setup.sh
(2 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
tests/e2e/manifest/setup.sh
[info] 50-50: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 55-55: Double quote to prevent globbing and word splitting.
(SC2086)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
- GitHub Check: Run Kuttl E2E Tests
- GitHub Check: Analyze (go)
Hmm, in that case it must be running tests on cached code. |
We set mode once after all the steps are fulfilled, but I will cross check if we are waiting for EndpointSlices or not. |
Right now, we are not waiting for EndpointSliceChanges to take effect: if isResolverSliceFound {
if err := r.Update(ctx, newEndpointSlice); err != nil {
r.Logger.Error("failed to update sliceToResolver", zap.String("endpointslice", EndpointsliceNamespacedName.String()), zap.Error(err))
return fmt.Errorf("createOrUpdateEndpointsliceToResolver: %w", err)
}
r.Logger.Info("EndpointSlice updated successfully", zap.String("endpointslice", EndpointsliceNamespacedName.String()))
} Here is more context: When you call |
…5 second sleep on first test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/e2e/manifest/send_traffic.sh (1)
78-78
: Restore kubectl compatibility by dropping--all-pods
.
kubectl logs
still rejects the--all-pods=true
flag, so this command exits with “unknown flag: --all-pods” and the failure diagnostics never run. Please remove the unsupported flag (or loop over pods viakubectl get
) to keep log collection working.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
operator/internal/controller/opsModes.go
(2 hunks)tests/e2e/manifest/send_traffic.sh
(5 hunks)
🧰 Additional context used
🪛 Shellcheck (0.11.0)
tests/e2e/manifest/send_traffic.sh
[info] 126-126: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 130-130: echo may not expand escape sequences. Use printf.
(SC2028)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/resolver)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/pkg)
- GitHub Check: test (/home/runner/work/KubeElasti/KubeElasti/operator)
- GitHub Check: Analyze (go)
- GitHub Check: Run Kuttl E2E Tests
🔇 Additional comments (2)
operator/internal/controller/opsModes.go (2)
77-77
: LGTM!Log messages correctly reflect the reordered operations with accurate sequential numbering.
Also applies to: 83-83
74-77
: Verify EndpointSlice readiness after creation. No direct EndpointSlice List/Get calls were found inwatchPublicService
; ensure it relies on an informer cache to pick up the create/update event or add an explicit poll/wait after Line 77 to eliminate the race.
* e2e: fix 02 test failure There is a race condition between curl & elasti operator. When elasti sets up EndpointSlices, curl is already doing it request, which results in curl failure. Explicit wait for proxy mode solves this issue. Test 06 is introducing helm release drift, with resolver being scaled to 2. Helm drift fixup was added. * setup: don't correct helm drift * e2e assert: it can be only action in file * feat: move endpointslice creation earlier, add retry to curl, with a 5 second sleep on first test * test: retest --------- Co-authored-by: Raman Tehlan <[email protected]>
* e2e: fix 02 test failure There is a race condition between curl & elasti operator. When elasti sets up EndpointSlices, curl is already doing it request, which results in curl failure. Explicit wait for proxy mode solves this issue. Test 06 is introducing helm release drift, with resolver being scaled to 2. Helm drift fixup was added. * setup: don't correct helm drift * e2e assert: it can be only action in file * feat: move endpointslice creation earlier, add retry to curl, with a 5 second sleep on first test * test: retest --------- Co-authored-by: Raman Tehlan <[email protected]>
Description
This fixes test 02 issues.
There is a race condition where pod is scaled to 0 and in parallel:
As EndpointSlices are not set, curl request is probably ending in istio.
With this fix, problem doesn't persist, but its covered by explicit wait for proxy mode.
There is also helm drift after 06-resolver-endpoint-sync, where resolver is scaled to 2.
Drift correction was also added.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist:
Summary by CodeRabbit
Tests
Chores