Skip to content

Match labels #1894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: oadp-dev
Choose a base branch
from
Open

Match labels #1894

wants to merge 6 commits into from

Conversation

weshayutin
Copy link
Contributor

@weshayutin weshayutin commented Aug 6, 2025

Why the changes were made

Test use and changes for https://issues.redhat.com/browse/OADP-6248
Depends-On: openshift/hypershift-oadp-plugin#96

How to test the changes made

make test-e2e GINKGO_ARGS="--ginkgo.focus='MySQL-label application CSI'"

recreates hypershift plugin bug where restores fail w/
""Namespace mysql, resource restore error: error preparing secrets/mysql/mysql: rpc error: code = Unknown desc = backup or included namespaces is nil" "

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 6, 2025
Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Variable Declaration Style
    (tests/e2e/backup_restore_suite_test.go:167-169):
    - Unnecessary variable pre-declaration could be simplified
  // Current
  var nsRequiresResticDCWorkaround bool
  var err error
  nsRequiresResticDCWorkaround, err =
  lib.NamespaceRequiresResticDCWorkaround(...)
  // Better
  nsRequiresResticDCWorkaround, err :=
  lib.NamespaceRequiresResticDCWorkaround(...)
  1. Label Extraction Logic
    (tests/e2e/backup_restore_suite_test.go:188-194):
    - Only uses first label from map, which may cause unexpected behavior
    - Should either support multiple labels or document single-label
    limitation
    - Consider using metav1.LabelSelector consistently

@weshayutin weshayutin added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 6, 2025
Signed-off-by: Wesley Hayutin <[email protected]>
Signed-off-by: Wesley Hayutin <[email protected]>
@weshayutin
Copy link
Contributor Author

/retest

@kaovilai
Copy link
Member

Is label test still valid for hcp plugin?

@weshayutin
Copy link
Contributor Author

Is label test still valid for hcp plugin?

@kaovilai the backup via label is NOT FOR hypershift. However I'm sure you agree that backing up apps ( our sample apps ) should work even if the hypershift plugin is configured in the dpa. :)

@weshayutin
Copy link
Contributor Author

/test 4.19-e2e-test-aws

@kaovilai
Copy link
Member

Note

This comment was generated with Claude

Critical Pod Ownership Conflict Issue

I've identified a critical pod ownership conflict in the MySQL persistent test configuration that could cause issues with pod management.

Problem: In the file mysql-persistent-csi.yaml, both the Deployment and DeploymentConfig resources are using overlapping label selectors:

  • Both controllers will attempt to manage pods with the same labels
  • This creates a conflict where multiple controllers compete for ownership of the same pods
  • Can lead to unpredictable behavior, pod thrashing, and failed deployments

Root Cause: The selectors are not sufficiently unique to distinguish between pods managed by different controller types.

Recommended Fix: Make the selectors more specific by adding distinct labels to differentiate between Deployment and DeploymentConfig managed pods. For example:

  • Deployment pods: add deployment-type: standard
  • DeploymentConfig pods: add deployment-type: config

This ensures each controller only manages its intended pods and prevents ownership conflicts.

@weshayutin
Copy link
Contributor Author

/retest

@weshayutin weshayutin removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 18, 2025
Copy link

openshift-ci bot commented Aug 18, 2025

@weshayutin: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.20-e2e-test-hcp-aws 5677a21 link false /test 4.20-e2e-test-hcp-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selector Conflict Found - Changes Requested

Note

Responses generated with Claude

Summary

While this PR successfully implements label-based backup functionality, there's a critical selector conflict in mysql-persistent-csi.yaml that needs to be resolved.

Issue

Two controllers have overlapping selectors that can cause race conditions and pod management conflicts:

  • Deployment mysql: matchLabels: app: todolist
  • DeploymentConfig todolist: selector: app: todolist, service: todolist

Required Changes

Please update the Deployment selector to be more specific and avoid the overlap. See the inline comment for specific recommendations.

Otherwise

The implementation looks good:

  • ✅ Clean label-based backup functionality
  • ✅ Proper test coverage
  • ✅ Well-structured YAML manifests

Once the selector conflict is resolved, this should be ready to merge!

@weshayutin
Copy link
Contributor Author

@claude, perhaps you are correct.

Two controllers have overlapping selectors that can cause race conditions and pod management conflicts:

Deployment mysql: matchLabels: app: todolist
DeploymentConfig todolist: selector: app: todolist, service: todolist

I don't see why we care in this case. Help me understand why it's important to block this pr. The key/value pair "app: todolist" is used for selectors on the backup and restore. The key/value pair service: todolist is to label the service. Other than the value string, WHY SHOULD WE CARE?

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visiack still works from meet

@weshayutin
Copy link
Contributor Author

weshayutin commented Aug 18, 2025

OK.. so @kaovilai and I jumped on a call to check this.

  1. Deployed the app from the pr w/ 1 replicaset.. no problems
  2. Then go to the deploymentConfig and scale up from 1 to 4.
  3. Results - one pod crashed
Failed to create logfilelogrus.txt
time="2025-08-18T21:53:43Z" level=info msg="Failed: log file /tmp/log/todoapp/app.log create failed" func=main.main file="/build/todolist.go:226"
time="2025-08-18T21:53:43Z" level=info msg="Attempting to connect to: changeme:changeme@tcp(mysql:3306)/todolist" func=main.connectToMariaDBRemote file="/build/todolist.go:71"
2025/08/18 21:53:43 /build/todolist.go:74
[error] failed to initialize database, got error dial tcp 172.30.136.52:3306: connect: connection refused
time="2025-08-18T21:53:43Z" level=error msg="Connection failed" func=main.connectToMariaDBRemote file="/build/todolist.go:76"
time="2025-08-18T21:53:43Z" level=info msg="Attempting to connect to: test:test@tcp(127.0.0.1:3306)/todolist" func=main.connectToMariaDBLocal file="/build/todolist.go:58"
2025/08/18 21:53:43 /build/todolist.go:60
[error] failed to initialize database, got error dial tcp 127.0.0.1:3306: connect: connection refused
time="2025-08-18T21:53:43Z" level=error msg="Connection failed" func=main.connectToMariaDBLocal file="/build/todolist.go:62"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x2cc9b8]
goroutine 1 [running]:
gorm.io/gorm.(*DB).getInstance(0x1?)
/go/pkg/mod/gorm.io/[email protected]/gorm.go:399 +0x18
gorm.io/gorm.(*DB).Migrator(0x65bec0?)
/go/pkg/mod/gorm.io/[email protected]/migrator.go:12 +0x1c
main.main()
/build/todolist.go:236 +0x1a

So the additional todolist apps are having problems connecting to the db. I don't think that is related to the label though.
¯_(ツ)_/¯

Eventually all the pods can be running. I'm going to investigate the proper way to label an app while also ensuring services etc. work properly.

whayutin@fedora:~/OPENSHIFT/git/OADP/oadp-operator/tests/e2e/sample-applications/mysql-persistent$ oc get pods -n mysql-persistent
NAME                    READY   STATUS      RESTARTS        AGE
mysql-f9cfb9bb7-45vgk   2/2     Running     0               7m40s
todolist-1-7bkcg        1/1     Running     3 (5m57s ago)   6m34s
todolist-1-deploy       0/1     Completed   0               7m40s
todolist-1-jxpt7        1/1     Running     0               7m39s
todolist-1-lprpw        1/1     Running     4 (2m57s ago)   3m59s
todolist-1-mfvd8        1/1     Running     0               6m32s
todolist-1-mlx4x        1/1     Running     0               6m33s
whayutin@fedora:~/OPENSHIFT/git/OADP/oadp-operator/tests/e2e/sample-
whayutin@fedora:~/OPENSHIFT/git/OADP/oadp-operator/tests/e2e/sample-applications/mysql-persistent$ oc get deployments -n mysql-persistent 
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
mysql   1/1     1            1           8m19s
whayutin@fedora:~/OPENSHIFT/git/OADP/oadp-operator/tests/e2e/sample-applications/mysql-persistent$ oc get deploymentconfigs.apps.openshift.io -n mysql-persistent 
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
NAME       REVISION   DESIRED   CURRENT   TRIGGERED BY
todolist   1          5         5         config

IMHO this is good to land as is and I'll follow up w/ a

  1. backup spec that has a match labels with an AND.. so we match 2 labels
  2. updated labels across the app and mysql db.

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visiack still works in meet

Copy link

openshift-ci bot commented Aug 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kaovilai
Copy link
Member

For 2 ReplicaSets specifying the same .spec.selector but different .spec.template.metadata.labels and .spec.template.spec fields, each ReplicaSet ignores the Pods created by the other ReplicaSet.

https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/

My comments would've applied if there's no ignore mechanism here.

spec:
labelSelector:
matchLabels:
app: todolist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the label should be unique across different tests and deployments?

This may have consequences if running in parallel:
https://github.com/openshift/oadp-operator/blob/oadp-dev/tests/e2e/sample-applications/mongo-persistent/mongo-persistent-csi.yaml#L42

Secondly how about also using multiple label selectors implemented by (note this is OR operator):
vmware-tanzu/velero#4650

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so mysql specific todolist label?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants