Skip to content

OCPBUGS-56281: gatewayapicontroller: Clean up resources when done #29900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Jun 9, 2025

gatewayapicontroller: Add checks for empty slices

Check whether the slice of parent resource references in an httproute's status is empty before indexing the slice.

Before this commit, the "Ensure HTTPRoute object is created" test sometimes panicked with "runtime error: index out of range [0] with length 0".

Similarly, check whether the slice of load-balancer ingress points in a service's status is empty before indexing it.

gatewayapicontroller: Clean up resources when done

Delete the gatewayclass and uninstall OSSM after all the Gateway API controller tests are done.

Before this change, the Gateway API controller tests left OSSM installed, including the subscription, CSV, installplan, bundled CRDs, RBAC resources, deployment, service, serviceaccount, etc., when the tests were finished. This clutter could cause problems for other tests, or for the same test if it was run again.

The new cleanup logic uses the OperatorsV1 client from github.com/operator-framework/operator-lifecycle-manager. Importing this package requires a replace stanza for openshift/api in go.mod.

This vendors github.com/operator-framework/operator-lifecycle-manager v0.30.1-0.20250114164243-1b6752ec65fa rather than the newest revision in order to avoid bringing in additional problematic vendor bumps that the newest revision would bring in.

gatewayapicontroller: Always log errors

Add the error value to some log messages that were missing it.

Check whether the slice of parent resource references in an httproute's
status is empty before indexing the slice.

Before this commit, the "Ensure HTTPRoute object is created" test
sometimes panicked with "runtime error: index out of range [0] with
length 0".

Similarly, check whether the slice of load-balancer ingress points in a
service's status is empty before indexing it.

* test/extended/router/gatewayapicontroller.go (buildGateway)
(createHttpRoute): Add checks.
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 9, 2025
@openshift-ci-robot
Copy link

@Miciah: This pull request references Jira Issue OCPBUGS-56281, which is invalid:

  • expected the bug to target the "4.20.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

gatewayapicontroller: Add checks for empty slices

Check whether the slice of parent resource references in an httproute's status is empty before indexing the slice.

Before this commit, the "Ensure HTTPRoute object is created" test sometimes panicked with "runtime error: index out of range [0] with length 0".

Similarly, check whether the slice of load-balancer ingress points in a service's status is empty before indexing it.

gatewayapicontroller: Clean up resources when done

Delete the gatewayclass and uninstall OSSM after all the Gateway API controller tests are done.

Before this change, the Gateway API controller tests left OSSM installed, including the subscription, CSV, installplan, bundled CRDs, RBAC resources, deployment, service, serviceaccount, etc., when the tests were finished. This clutter could cause problems for other tests, or for the same test if it was run again.

The new cleanup logic uses the OperatorsV1 client from github.com/operator-framework/operator-lifecycle-manager. Importing this package requires a replace stanza for openshift/api in go.mod.

This vendors github.com/operator-framework/operator-lifecycle-manager v0.30.1-0.20250114164243-1b6752ec65fa rather than the newest revision in order to avoid bringing in additional problematic vendor bumps that the newest revision would bring in.

gatewayapicontroller: Always log errors

Add the error value to some log messages that were missing it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from knobunc and p0lyn0mial June 9, 2025 13:29
@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Jun 9, 2025
@Miciah Miciah force-pushed the OCPBUGS-56281-gatewayapicontroller-clean-up-resources-when-done branch from fc08232 to bf853bf Compare June 9, 2025 16:11
Copy link

openshift-trt bot commented Jun 9, 2025

Job Failure Risk Analysis for sha: bf853bf

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (19) are below the historical average (1505): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2 IncompleteTests
Tests for this run (19) are below the historical average (1822): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout IncompleteTests
Tests for this run (29) are below the historical average (1778): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@rhamini3
Copy link
Contributor

LGTM, @melvinjoseph86 PTAL

@melvinjoseph86
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 12, 2025
Copy link
Contributor

openshift-ci bot commented Jun 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: melvinjoseph86, Miciah
Once this PR has been reviewed and has the lgtm label, please assign bertinatto for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@melvinjoseph86
Copy link
Contributor

/retest

Copy link

openshift-trt bot commented Jun 12, 2025

Job Failure Risk Analysis for sha: 1967dd2

Job Name Failure Risk
pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback IncompleteTests
Tests for this run (94) are below the historical average (209): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones High
[sig-network-edge][OCPFeatureGate:GatewayAPIController][Feature:Router][apigroup:gateway.networking.k8s.io] Ensure custom gatewayclass can be accepted [Suite:openshift/conformance/parallel]
This test has passed 98.38% of 2463 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (196) are below the historical average (3374): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
[CI] e2e-openstack-ovn-etcd-scaling job permanent fails at many openshift-test tests
etcd-scaling jobs failing ~60% of the time
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Medium
[sig-instrumentation] disruption/metrics-api connection/new should be available throughout the test
Potential external regression detected for High Risk Test analysis

Miciah added 2 commits June 12, 2025 21:43
Delete the gatewayclass and uninstall OSSM after all the Gateway API
controller tests are done.

Before this commit, the Gateway API controller tests left OSSM
installed, including the subscription, CSV, installplan, bundled CRDs,
RBAC resources, deployment, service, serviceaccount, etc., when the
tests were finished.  This clutter could cause problems for other tests,
or for the same test if it was run again.

The new cleanup logic uses the OperatorsV1 client from
github.com/operator-framework/operator-lifecycle-manager.  Importing
this package requires a replace stanza for openshift/api in go.mod.

This vendors github.com/operator-framework/operator-lifecycle-manager
v0.30.1-0.20250114164243-1b6752ec65fa rather than the newest revision
in order to avoid bringing in additional problematic vendor bumps that
the newest revision would bring in.

This commit fixes OCPBUGS-56281.

https://issues.redhat.com/browse/OCPBUGS-56281

* test/extended/router/gatewayapicontroller.go: Delete the gatewayclass
that the test creates.  Use the OperatorsV1 client to look up the
Operator object for OSSM, and delete all the resources that the Operator
object references.
* go.mod: Vendor the operatorsv1 client code from
github.com/operator-framework/operator-lifecycle-manager.
* go.sum:
* vendor/*: Regenerate.
* test/extended/router/gatewayapicontroller.go: Add the error value to
some log messages that were missing it.
@Miciah Miciah force-pushed the OCPBUGS-56281-gatewayapicontroller-clean-up-resources-when-done branch from 1967dd2 to ab81b79 Compare June 13, 2025 01:53
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2025
Copy link
Contributor

openshift-ci bot commented Jun 13, 2025

New changes are detected. LGTM label has been removed.

Copy link

openshift-trt bot commented Jun 13, 2025

Job Failure Risk Analysis for sha: ab81b79

Job Name Failure Risk
pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback MissingData
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones High
[sig-network-edge][OCPFeatureGate:GatewayAPIController][Feature:Router][apigroup:gateway.networking.k8s.io] Ensure custom gatewayclass can be accepted [Suite:openshift/conformance/parallel]
This test has passed 99.76% of 2503 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (2125) are below the historical average (3401): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@Miciah Miciah force-pushed the OCPBUGS-56281-gatewayapicontroller-clean-up-resources-when-done branch from ab81b79 to 1dcc98a Compare June 13, 2025 15:51
@Miciah
Copy link
Contributor Author

Miciah commented Jun 13, 2025

https://github.com/openshift/origin/compare/1967dd22c83963e780eb9953bc38da760e090dc8..1dcc98a3c2ec7c38dcee818e750e14ce57d70892 made these changes:

  • Add logic to delete the Istio CR in the test cleanup.
  • Declare package consts for istioName and ingressNamespace and use these instead of function-local variables and string literals.
  • Omit the namespace when getting the Istio CR, which is cluster-scoped.

Before these changes, pods.json from e2e-aws #1932229162710339584 had the istiod pod. After these changes, pods.json from e2e-aws #1933552902287134720 does not have the istiod pod. It appears that the istiod pod cleanup is working properly.

Also, comparing must-gather.tar from 1933552902287134720 and must-gather.tar from 1932229162710339584, the older must-gather archive has the istiorevisions.sailoperator.io.yaml CRD whereas the newer must-gather archive does not. Neither must-gather archive has any other istio.io or sailoperator.io CRDs. I believe that deleting the Istio CR enables the cleanup to delete all OSSM-installed CRDs successfully.

Copy link

openshift-trt bot commented Jun 13, 2025

Job Failure Risk Analysis for sha: 1dcc98a

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn High
[sig-network-edge][OCPFeatureGate:GatewayAPIController][Feature:Router][apigroup:gateway.networking.k8s.io] Ensure HTTPRoute object is created [Suite:openshift/conformance/parallel]
This test has passed 99.22% of 2451 runs on release 4.20 [Overall] in the last week.

Open Bugs
Component Readiness: [Networking / router] [OCPFeatureGate:GatewayAPIController] test regressed on HyperShift Azure AKS
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift High
[sig-api-machinery] API priority and fairness should ensure that requests can be classified by adding FlowSchema and PriorityLevelConfiguration [Suite:openshift/conformance/parallel] [Suite:k8s]
This test has passed 99.97% of 3060 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade IncompleteTests
Tests for this run (2125) are below the historical average (3318): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade IncompleteTests
Tests for this run (19) are below the historical average (1620): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
---
[sig-api-machinery] disruption/cache-openshift-api apiserver/openshift-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[sig-api-machinery] disruption/cache-oauth-api apiserver/oauth-apiserver connection/new should be available throughout the test
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:vsphere SecurityMode:default Topology:ha Upgrade:none] in the last week.

@abhat
Copy link
Contributor

abhat commented Jun 16, 2025

/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade 5

Copy link
Contributor

openshift-ci bot commented Jun 16, 2025

@abhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/101e8ee0-4acb-11f0-928a-4bd1c2be89d0-0

e2e.Failf("Failed to delete GatewayClass %q", gatewayClassName)
}

g.By("Deleting the OSSM Operator resources")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why we don't use an owner reference for Subscription? We could owner reference the gatewayclass and let Kube do the cascading deletion.

Upd: Deletion of Subscription doesn't delete CSV or CRDs. The CRD part is understandable: there can be some data loss. But CSV is kinda interesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few reasons not to put or rely on an owner reference on the subscription:

  • You could create the subscription manually; we cannot assume that the operator created it.
  • You could have multiple gatewayclasses with our controller name, and then it isn't clear how we would configure the owner references on the subscription. Would we add only the first gatewayclass with our controller name? Would we add all gatewayclasses with our controller name? If we added more than one owner reference, would we need to delete old owner references when the corresponding gatewayclasses were deleted? If we did delete stale owner references, would that prevent garbage collection, or would we always leave one non-stale reference to trigger garbage collection?
  • I don't know for sure that OLM doesn't look at the owner reference. We would need to check this.
  • I am not confident that an owner reference would cause the subscription to be deleted as the owner reference on the Istio CR didn't cause it to be deleted (see OCPBUGS-56281: gatewayapicontroller: Clean up resources when done #29900 (comment)).
  • Deleting the Istio CR only requires changing the test, it is more explicit than relying on garbage collection, and it is more obviously safe to backport.

Comment on lines +116 to +118
g.By("Deleting the Istio CR")

o.Expect(oc.AsAdmin().Run("delete").Args("--ignore-not-found=true", "istio", istioName).Execute()).Should(o.Succeed())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Istio CR is supposed to be garbage collected since its owner reference is gatewayclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The owner reference on the Istio CR didn't cause it to be deleted (see #29900 (comment)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The owner reference on the Istio CR didn't cause it to be deleted

I didn't manage to reproduce this behavior. I saw Istio CR gets deleted after GatewayClass:

$ oc get gc
NAME                CONTROLLER                           ACCEPTED   AGE
openshift-default   openshift.io/gateway-controller/v1   True       4m12s

04:57:08 $ oc get istio
NAME                REVISIONS   READY   IN USE   ACTIVE REVISION     STATUS    VERSION   AGE
openshift-gateway   1           1       0        openshift-gateway   Healthy   v1.24.3   4m18s

04:57:14 $ oc get istio openshift-gateway -o yaml | yq .metadata.ownerReferences[0]
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
name: openshift-default
uid: 3f6ef6ed-9e6b-4821-9706-221ff0bca83e

04:57:34 $ oc -n openshift-ingress get pods
NAME                                        READY   STATUS    RESTARTS      AGE
istiod-openshift-gateway-7b567bc8b4-z9972   1/1     Running   0             4m48s
router-default-76c4888886-fmtzq             1/1     Running   0             77m
router-default-76c4888886-nm9mb             1/1     Running   2 (78m ago)   89m

04:57:52 $ oc delete gc openshift-default
gatewayclass.gateway.networking.k8s.io "openshift-default" deleted

04:58:07 $ oc get istio
No resources found

04:58:14 $ oc -n openshift-ingress get pods
NAME                              READY   STATUS    RESTARTS      AGE
router-default-76c4888886-fmtzq   1/1     Running   0             78m
router-default-76c4888886-nm9mb   1/1     Running   2 (78m ago)   89m

Comment on lines 161 to 164
if err != nil && strings.Contains(err.Error(), "not found") {
e2e.Logf("Subscription %q not found; retrying...", expectedSubscriptionName)
return false, nil
}
Copy link
Contributor

@alebedev87 alebedev87 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should be consistent among all the polls we do in this block. I personally prefer how it's done for the OSSM deployment below:

		if err != nil {
				e2e.Logf("Failed to get OSSM operator deployment %q: %v; retrying...", deploymentOSSMName, err)
				return false, nil
			}

No assertions, just a retry for any error until the timeout is triggered. I think that some errors (not only "Not Found") can be temporary or intermittent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep my changes more narrowly focused. All right, I can make the polling loop for the subscription retry on all errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miciah added 2 commits June 17, 2025 03:46
Log errors and then retry in the polling loops for the Subscription and
Istio CRs.

Before this commit, the gatewayapicontroller tests sometimes failed
because OSSM was still installing when these polling loops ran, and the
polling loops would fail on a "not found" error (if the CR had not yet
been created) or a "server doesn't have a resource type" error (if the
CRD had not yet been created).  In order to make the tests more
reliable, they need to retry on these errors.  For consistency with
other polling loops, this commit makes these polling loops retry on all
errors (not just "not found" or "doesn't have a resource type" errors).

* test/extended/router/gatewayapicontroller.go: Retry when the test
fails to get the OSSM subscription or the Istio CR.
* test/extended/router/gatewayapicontroller.go: Increase the timeouts on
some polling loops that have been observed to flake but then succeed on
retry.
Miciah added 4 commits June 17, 2025 03:46
* test/extended/router/gatewayapicontroller.go (ingressNamespace): New
const.
(waitForIstioHealthy, createAndCheckGateway)
(assertGatewayLoadbalancerReady, assertDNSRecordStatus, createHttpRoute)
(assertHttpRouteSuccessful): Use the new const instead of function-level
variables or string literals.
* test/extended/router/gatewayapicontroller.go: Omit the namespace
when getting the Istio CR, which is cluster-scoped.
* test/extended/router/gatewayapicontroller.go (istioName): Declare
const.
(waitForIstioHealthy): Use the new const instead of a string literal.
* test/extended/router/gatewayapicontroller.go: Delete the Istio CR and
wait for the istiod pod to be deleted as part of the test cleanup.
@Miciah Miciah force-pushed the OCPBUGS-56281-gatewayapicontroller-clean-up-resources-when-done branch from 1dcc98a to 38d8018 Compare June 17, 2025 07:48
@Thealisyed
Copy link

LGTM, holding off for @alebedev87 comments

Copy link
Contributor

openshift-ci bot commented Jun 17, 2025

@Miciah: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn 38d8018 link false /test e2e-metal-ipi-ovn
ci/prow/e2e-aws-ovn-serial-publicnet-1of2 38d8018 link false /test e2e-aws-ovn-serial-publicnet-1of2
ci/prow/e2e-azure-ovn-etcd-scaling 38d8018 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-disruptive 38d8018 link false /test e2e-aws-disruptive
ci/prow/e2e-gcp-ovn-etcd-scaling 38d8018 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-gcp-csi 38d8018 link false /test e2e-gcp-csi
ci/prow/e2e-gcp-ovn-rt-upgrade 38d8018 link false /test e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-aws-ovn-single-node 38d8018 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-hypershift-conformance 38d8018 link false /test e2e-hypershift-conformance
ci/prow/e2e-vsphere-ovn-etcd-scaling 38d8018 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-gcp-fips-serial-2of2 38d8018 link false /test e2e-gcp-fips-serial-2of2
ci/prow/e2e-gcp-fips-serial-1of2 38d8018 link false /test e2e-gcp-fips-serial-1of2
ci/prow/e2e-aws-ovn-etcd-scaling 38d8018 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-metal-ipi-serial-1of2 38d8018 link false /test e2e-metal-ipi-serial-1of2
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 38d8018 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-azure 38d8018 link false /test e2e-azure
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 38d8018 link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-gcp-ovn 38d8018 link true /test e2e-gcp-ovn
ci/prow/e2e-openstack-ovn 38d8018 link false /test e2e-openstack-ovn
ci/prow/e2e-gcp-ovn-upgrade 38d8018 link true /test e2e-gcp-ovn-upgrade
ci/prow/e2e-aws-ovn-single-node-upgrade 38d8018 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-serial-publicnet-2of2 38d8018 link false /test e2e-aws-ovn-serial-publicnet-2of2
ci/prow/e2e-aws-ovn 38d8018 link false /test e2e-aws-ovn
ci/prow/e2e-aws 38d8018 link false /test e2e-aws
ci/prow/e2e-azure-ovn-upgrade 38d8018 link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-aws-ovn-microshift 38d8018 link true /test e2e-aws-ovn-microshift
ci/prow/e2e-aws-ovn-edge-zones 38d8018 link true /test e2e-aws-ovn-edge-zones
ci/prow/e2e-gcp-disruptive 38d8018 link false /test e2e-gcp-disruptive
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway 38d8018 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/okd-e2e-gcp 38d8018 link false /test okd-e2e-gcp
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 38d8018 link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

openshift-trt bot commented Jun 17, 2025

Job Failure Risk Analysis for sha: 38d8018

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-csi IncompleteTests
Tests for this run (19) are below the historical average (1374): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (19) are below the historical average (1140): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2 IncompleteTests
Tests for this run (18) are below the historical average (1403): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 IncompleteTests
Tests for this run (19) are below the historical average (1430): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn IncompleteTests
Tests for this run (19) are below the historical average (1146): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling IncompleteTests
Tests for this run (19) are below the historical average (1343): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade IncompleteTests
Tests for this run (19) are below the historical average (1315): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (19) are below the historical average (810): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@Miciah
Copy link
Contributor Author

Miciah commented Jun 19, 2025

The aggregated jobs each failed while buliding the tests-openshift.origin-amd64 image, with the error message, "Error: Unable to find a match: python3-cinderclient" (missing RPM package). I'll retry in case it was glitch with the Yum repository.

/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade 5

Copy link
Contributor

openshift-ci bot commented Jun 19, 2025

@Miciah: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/aad73360-4cbf-11f0-9efa-6a57a5235fed-0

@Miciah
Copy link
Contributor Author

Miciah commented Jun 19, 2025

This time all the aggregated jobs failed to build the image with the erorr message, "Error: Unable to find a match: realtime-tests rteval".

/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade 5

Copy link
Contributor

openshift-ci bot commented Jun 19, 2025

@Miciah: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c0124e40-4d46-11f0-9623-e230cc269dc8-0

@Miciah
Copy link
Contributor Author

Miciah commented Jun 20, 2025

This time all the aggregated jobs failed with, "Error: Unable to find a match: python3-cinderclient realtime-tests rteval". I have filed OCPBUGS-57921 for these failures.

@Miciah
Copy link
Contributor Author

Miciah commented Jun 20, 2025

/payload-aggregate periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade 5

Copy link
Contributor

openshift-ci bot commented Jun 20, 2025

@Miciah: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.19-upgrade-from-stable-4.18-e2e-gcp-ovn-rt-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7029ee70-4e03-11f0-8a1d-a44ec557b951-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants