Skip to content

Conversation

srm09
Copy link
Contributor

@srm09 srm09 commented Sep 29, 2025

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added this to the v1.14 milestone Sep 29, 2025
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 29, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign chrischdi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2025

// Check the presence of the node-pool label on the VirtualMachineGroup
nodePool := supervisorMachineCtx.Machine.Labels[clusterv1.MachineDeploymentNameLabel]
if zone, ok := vmOperatorVMGroup.Labels[fmt.Sprintf("capv/%s", nodePool)]; ok && zone != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in capv design doc, adding prefix may will fail badly when using long node pool names. I've dropped the prefix in VMG controller.

}
}

// Check the presence of the node-pool label on the VirtualMachineGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's use Machine Deployment in capv instead of node pool.

// Check the presence of the node-pool label on the VirtualMachineGroup
nodePool := supervisorMachineCtx.Machine.Labels[clusterv1.MachineDeploymentNameLabel]
if zone, ok := vmOperatorVMGroup.Labels[fmt.Sprintf("capv/%s", nodePool)]; ok && zone != "" {
supervisorMachineCtx.VSphereMachine.Spec.FailureDomain = ptr.To(zone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For day 2 operations, we only need to create the VM with a label "topology.kubernetes.io/zone: Zone-x", but no need to set VSphereMachine.Spec.FailureDomain or Machine.Spec.FailureDomain. We should keep them consistent since they're empty from cluster specification.

After create VM with lable, VM Service will place it into that zone.

@srm09 srm09 force-pushed the nodepool/vm-aff-antiaff branch from 597af80 to 87609a7 Compare October 7, 2025 21:57
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 7, 2025
@srm09 srm09 force-pushed the nodepool/vm-aff-antiaff branch from 41a03e3 to 1759351 Compare October 8, 2025 16:24
zhanggbj and others added 4 commits October 8, 2025 09:27
- Bump VMOP including Node AF/AAF support
- Add NodeAutoPlacement Feature Gate

(cherry picked from commit 700c8ae)
Removes the extra cases for VMG creation, such that VMG is created for:
1. Multiple zones, multiple MDs with no failureDomain
2. Multiple zones, multiple MDs with failureDomain
3. Single zone, existing cluster with no failureDomain MDs

Signed-off-by: Sagar Muchhal <[email protected]>
- Updates VMOP API dependency

Misc VMG fixes
- Use namingStrategy to calculate VM names
- Use MachineDeployment names for VMG placement label
- Includes all machinedeployments to generate node-pool -> zone
  mapping

Fixes VMG webhook validation error
- Adds cluster-name label to Af/AAF spec
- re-adds zone topology key back to anti-aff spec

Signed-off-by: Sagar Muchhal <[email protected]>
@srm09 srm09 force-pushed the nodepool/vm-aff-antiaff branch from 1759351 to fa7432f Compare October 8, 2025 16:27
// }
// }
// log.V(5).Info("Identified worker Machines linked to MachineSets", "count", len(workerMachines))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @srm09 , to get the current vspheremachine, I think it works during initial creations.
However during an rolling upgrade

  • The old MachineSet (based on the previous template) starts scaling down.
  • The new MachineSet (based on the updated template) starts scaling up.

I'm afraid it's not accurate to get VSphereMachine objects by list and compare with the desired replica count, so we should filter out by the MachineSets and wait. WDYT?

},
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srm09
I noticed this is removed from design doc "Is failureDomain specified? If yes, create VM with zone label."
Could you help to clarify? If failureDomain is specified, I think we could simply honor it and create VM directly with zone label. Why we are still adding AF/AAF configs here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the configuration of all VMs similar and avoiding code branching when not necessarily needed.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 10, 2025
srm09 and others added 2 commits October 13, 2025 11:11
* Refine VMG controller when generate per-MD zone labels

- Skip legacy already-placed VM which do not have placement info
- Skip VM which do not have zone info

* Apply suggestions from code review

---------

Co-authored-by: Sagar Muchhal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants