kubeadm: Graduate NodeLocalCRISocket Feature gate to beta #131981

HirazawaUi · 2025-05-27T13:50:56Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

I reviewed the goals required for the Beta phase outlined in the KEP:

In the Beta phase, the feature gate is enabled by default. If the feature gate is disabled, kubeadm subcommands will not be changed. When the feature gate is enabled, the kubeadm subcommands change as follows:  

- `kubeadm upgrade apply/node` will use `/var/lib/kubelet/instance-config.yaml` and override the `ContainerRuntimeEndpoint` field to `/var/lib/kubelet/config.yaml`.

I believe maintaining the current existing behavior is sufficient because some users who did not adopt this feature during the Alpha phase may still rely on kubeadm-flags.env to generate /var/lib/kubelet/instance-config.yaml, which ultimately merges into /var/lib/kubelet/config.yaml.

Additionally, the two issues from the Alpha phase have already been addressed:

[NodeLocalCRISocket]: Remove container-runtime-endpoint flag when kubeadm upgrade #129278: Remove the flag --container-runtime-endpoint from the /var/lib/kubelet/kubeadm-flags.env file during kubeadm upgrade.
[NodeLocalCRISocket]: remove kubeadm.alpha.kubernetes.io/cri-socket annotation when kubeadm upgrade #129279: Remove the kubeadm.alpha.kubernetes.io/cri-socket annotation from a given node during kubeadm upgrade.

Therefore, there is no additional work required for the Beta phase. Promoting the feature gate to Beta and enabling it by default should suffice.

Does this PR introduce a user-facing change?

kubeadm: graduated the `NodeLocalCRISocket` feature gate to beta and enabed it by default. When its enabled, kubeadm will:
  1. Generate a `/var/lib/kubelet/instance-config.yaml` file to customize the `containerRuntimeEndpoint` field in per-node kubelet configurations.
  2. Remove the `kubeadm.alpha.kubernetes.io/cri-socket` annotation from nodes during upgrade operations.
  3. Remove the `--container-runtime-endpoint` flag from the `/var/lib/kubelet/kubeadm-flags.env` file during upgrades.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/4656-add-kubelet-instance-configuration

k8s-ci-robot · 2025-05-27T13:51:06Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-05-27T13:51:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HirazawaUi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kubeadm/OWNERS~~ [HirazawaUi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

neolit123

Additionally, the two issues from the Alpha phase have already been addressed:

[NodeLocalCRISocket]: Remove container-runtime-endpoint flag when kubeadm upgrade #129278: Remove the flag --container-runtime-endpoint from the /var/lib/kubelet/kubeadm-flags.env file during kubeadm upgrade.

[NodeLocalCRISocket]: remove kubeadm.alpha.kubernetes.io/cri-socket annotation when kubeadm upgrade #129279: Remove the kubeadm.alpha.kubernetes.io/cri-socket annotation from a given node during kubeadm upgrade.

these are missing in

Add kubelet instance configuration to configure CRI socket for each node kubeadm#3042
please add them

kubeadm: Graduate NodeLocalCRISocket to beta.

better provide one additional sentence for context what the FG does e.g. what will happen for new clusters and on upgrade to 1.34.

Therefore, there is no additional work required for the Beta phase. Promoting the feature gate to Beta and enabling it by default should suffice.

we need to update the website kubeadm init page where this FG is documented. dev-1.34 branch.
we also need to flip the kinder e2e to test the disabled case, during alpha we explicitly enabled the FG.

HirazawaUi · 2025-05-27T14:21:56Z

these are missing in

Add kubelet instance configuration to configure CRI socket for each node kubeadm#3042
please add them

Added.

better provide one additional sentence for context what the FG does e.g. what will happen for new clusters and on upgrade to 1.34.

Added to the release note. Could you please review if the wording accurately?

we need to update the website kubeadm init page where this FG is documented. dev-1.34 branch.

Will be added.

we also need to flip the kinder e2e to test the disabled case, during alpha we explicitly enabled the FG.

Will be added, but we do not support disabling a feature gate in the beta stage after enabling it during alpha (creating/adding/upgrading). I vaguely recall we previously confirmed this practice is prohibited, but I cannot find this restriction documented in the KEP. Could it be that I inadvertently overlooked this section while implementing the KEP?

neolit123 · 2025-05-27T14:25:46Z

/release-note-edit

kubeadm: graduated the `NodeLocalCRISocket` feature gate to beta and enabed it by default. When its enabled, kubeadm will:
  1. Generate a `/var/lib/kubelet/instance-config.yaml` file to customize the `containerRuntimeEndpoint` field in per-node kubelet configurations.
  2. Remove the `kubeadm.alpha.kubernetes.io/cri-socket` annotation from nodes during upgrade operations.
  3. Remove the `--container-runtime-endpoint` flag from the `/var/lib/kubelet/kubeadm-flags.env` file during upgrades.

just some minor formatting / edits.

neolit123 · 2025-05-27T14:27:45Z

we also need to flip the kinder e2e to test the disabled case, during alpha we explicitly enabled the FG.

Will be added, but we do not support disabling a feature gate in the beta stage after enabling it during alpha (creating/adding/upgrading). I vaguely recall we previously confirmed this practice is prohibited, but I cannot find this restriction documented in the KEP. Could it be that I inadvertently overlooked this section while implementing the KEP?

no, actually it's a standard practice for kubeadm e2e to flip the FG to disabled once the FG goes beta.
it covers the use cases for users that want to opt-out from the FG until it goes GA.
the rest of the e2e test jobs will start testing it as it's enabled by default.

HirazawaUi · 2025-05-27T14:33:10Z

no, actually it's a standard practice for kubeadm e2e to flip the FG to disabled once the FG goes beta.
it covers the use cases for users that want to opt-out from the FG until it goes GA.
the rest of the e2e test jobs will start testing it as it's enabled by default.

/hold
I will conduct further investigation into whether the code related to this feature gate supports reverting from an enabled to a disabled state. The PR cannot be merged until this investigation is completed.

neolit123 · 2025-05-27T14:47:42Z

no, actually it's a standard practice for kubeadm e2e to flip the FG to disabled once the FG goes beta.
it covers the use cases for users that want to opt-out from the FG until it goes GA.
the rest of the e2e test jobs will start testing it as it's enabled by default.

/hold I will conduct further investigation into whether the code related to this feature gate supports reverting from an enabled to a disabled state. The PR cannot be merged until this investigation is completed.

for 1.34, it's important to allow the user to disable the FG in the kubeadm-config CM and then call kubeadm upgrade which will not introduce the changes which the FG does.

if a 1.32/1.33 cluster had the FG enabled for alpha and during upgrade to 1.34 (beta) they want to disable it, that's not so important and we can claim it as unsupported. you can still check if it's easy to manage on our side or not.

pacoxu · 2025-05-28T03:33:43Z

/assign

cmd/kubeadm/app/util/config/cluster.go

neolit123 · 2025-05-30T17:17:53Z

cmd/kubeadm/app/util/config/cluster.go

-		criSocket, ok = node.ObjectMeta.Annotations[constants.AnnotationKubeadmCRISocket]
-		if !ok {
-			return missingAnnotationError
+		kubeletConfig, err := readKubeletConfig(constants.KubeletRunDirectory, constants.KubeletInstanceConfigurationFileName)


it doesn't matter too much but if we want to be precise any code path for reading the kubelet instance config should be only done if the feature gate is enabled.

only throw error on missing instance config if the FG is enabled

only try reading the instance config if the FG is enabled

But if we do this, we won't be able to handle the scenario where the Feature gate is enabled in the alpha stage but disabled when upgrading to beta. This is because, during the alpha phase, the kubeadm.alpha.kubernetes.io/cri-socket annotation on the Node object would have already been removed. Then, if the Feature gate is disabled in the beta stage, the code would throw an error when execution reaches this point :)

i think the problem is that we shouldn't have started deleting the annotation in the alpha phase.

so with the fg enabled with could write the file and prefer to read it instead of the annotation, but fallback reading the annotation. then during ga when the fg is locked to enabled we can start to remove the annotation from the node object.

either way, even if we messed the planning up a bit, the important aspect would be to have a smooth transition for the users and not to get any complaints. the annotation is widely used, unfortunately.

/lgtm
/cc pacoxu
do you have any comments before this merges?

As we already removed the annotation if the FG is enabled in v1.32/v1.33, I can accept the current change.

we won't be able to handle the scenario where the Feature gate is enabled in the alpha stage but disabled when upgrading to beta.

Do we need to add a warning for this scarnerio? (Even there is a kubelet instance config file).

Besides, only 1 nit and a question(should we include the config path in the error message, or already included.)

k8s-ci-robot · 2025-05-31T06:14:26Z

LGTM label has been added.

Git tree hash: 3dba4941108d39e8c444ccd0049ff47f9f0102ea

cmd/kubeadm/app/features/features.go

pacoxu · 2025-06-03T03:48:59Z

cmd/kubeadm/app/util/config/cluster.go

-	if features.Enabled(clusterCfg.FeatureGates, features.NodeLocalCRISocket) {
-		_, err = os.Stat(filepath.Join(constants.KubeletRunDirectory, constants.KubeletInstanceConfigurationFileName))
+	criSocket, ok := node.Annotations[constants.AnnotationKubeadmCRISocket]
+	if !ok {


ok=false means that user already runs upgrade with feature gate NodeLocalCRISocket=true.

If NodeLocalCRISocket=false and ok=false, should we log a warning message?

Below is an error, if no annotation and kubelet instance config.

We can print this warning log, but it might be better to:

When the Feature gate is not enabled, we print a warning log if the cri-socket annotation is not found.

When the Feature gate is enabled, we print a warning log if the kubelet instance config is not found.

Alternatively, we could avoid printing warning logs altogether and handle the issue for the user, making the process imperceptible to them.

Considering this issue stems from our mistake, is it necessary to burden the user with it? The user did nothing wrong but still receives a warning ╮(￣▽￣)╭.

pacoxu · 2025-06-03T03:53:20Z

cmd/kubeadm/app/util/config/cluster.go

-			return missingAnnotationError
+		kubeletConfig, err := readKubeletConfig(constants.KubeletRunDirectory, constants.KubeletInstanceConfigurationFileName)
+		if err != nil {
+			return errors.Wrapf(err, "could not read kubelet instance configuration on node %q", nodeName)


Should we show the absolute path in the message? (Maybe not needed.)

Hmm, we generally avoid printing absolute paths in logs, whether it's the kubelet configuration or the kubelet instance config.

k8s-ci-robot · 2025-06-03T11:28:39Z

New changes are detected. LGTM label has been removed.

HirazawaUi · 2025-06-03T13:12:46Z

/retest-required

k8s-ci-robot requested review from carlory and neolit123 May 27, 2025 13:51

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2025

neolit123 reviewed May 27, 2025

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2025

k8s-ci-robot assigned pacoxu May 28, 2025

HirazawaUi force-pushed the promote-4654-to-beta branch from 252a410 to 83154d3 Compare May 29, 2025 15:12

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 29, 2025

HirazawaUi commented May 29, 2025

View reviewed changes

cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved

cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved

HirazawaUi force-pushed the promote-4654-to-beta branch from 83154d3 to a3a0cea Compare May 29, 2025 15:36

neolit123 reviewed May 30, 2025

View reviewed changes

cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved

cmd/kubeadm/app/util/config/cluster.go Outdated Show resolved Hide resolved

HirazawaUi force-pushed the promote-4654-to-beta branch from a3a0cea to a036c4f Compare May 30, 2025 15:25

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 30, 2025

neolit123 reviewed May 30, 2025

View reviewed changes

k8s-ci-robot requested a review from pacoxu May 31, 2025 06:14

k8s-ci-robot assigned neolit123 May 31, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2025

pacoxu reviewed Jun 3, 2025

View reviewed changes

cmd/kubeadm/app/features/features.go Show resolved Hide resolved

pacoxu reviewed Jun 3, 2025

View reviewed changes

Graduate NodeLocalCRISocket to beta

ad3a13e

HirazawaUi force-pushed the promote-4654-to-beta branch from a036c4f to ad3a13e Compare June 3, 2025 11:28

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2025

k8s-ci-robot requested review from neolit123 and pacoxu June 3, 2025 11:28

kubeadm: Graduate NodeLocalCRISocket Feature gate to beta #131981

Are you sure you want to change the base?

kubeadm: Graduate NodeLocalCRISocket Feature gate to beta #131981

Uh oh!

Conversation

HirazawaUi commented May 27, 2025 • edited by k8s-ci-robot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented May 27, 2025

Uh oh!

k8s-ci-robot commented May 27, 2025

Uh oh!

neolit123 left a comment

Choose a reason for hiding this comment

Uh oh!

HirazawaUi commented May 27, 2025

Uh oh!

neolit123 commented May 27, 2025

Uh oh!

neolit123 commented May 27, 2025

Uh oh!

HirazawaUi commented May 27, 2025

Uh oh!

neolit123 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pacoxu commented May 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neolit123 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

HirazawaUi May 31, 2025

Choose a reason for hiding this comment

Uh oh!

neolit123 May 31, 2025

Choose a reason for hiding this comment

Uh oh!

pacoxu Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented May 31, 2025

Uh oh!

Uh oh!

pacoxu Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HirazawaUi Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pacoxu Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

HirazawaUi Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Jun 3, 2025

Uh oh!

HirazawaUi commented Jun 3, 2025

Uh oh!

Uh oh!

HirazawaUi commented May 27, 2025 •

edited by k8s-ci-robot

Loading

neolit123 commented May 27, 2025 •

edited

Loading

pacoxu Jun 3, 2025 •

edited

Loading

HirazawaUi Jun 3, 2025 •

edited

Loading