fix: prevent unnecessary rolling restarts during node draining #1092

synhershko · 2025-09-11T02:44:13Z

When nodes are drained during K8S or OS upgrades, pods get deleted and recreated, but the StatefulSet revision doesn't change. The existing logic incorrectly triggered rolling restarts in these scenarios, causing all pods with lower ordinal numbers to be terminated simultaneously.

This change adds revision comparison checks to prevent rolling restarts when StatefulSet current and update revisions match, indicating no actual spec changes occurred. Rolling restarts now only happen when there are legitimate configuration updates requiring pod recreation.

Changes:

Add revision match checks in RollingRestartReconciler.Reconcile()
Enhance WorkingPodForRollingRestart() to skip when revisions match
Add debug logging to indicate when rolling restarts are skipped

Fixes issues during node upgrades and other node draining scenarios where pods are recreated without spec changes. Fixes #312.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

When nodes are drained during K8S or OS upgrades, pods get deleted and recreated, but the StatefulSet revision doesn't change. The existing logic incorrectly triggered rolling restarts in these scenarios, causing all pods with lower ordinal numbers to be terminated simultaneously. This change adds revision comparison checks to prevent rolling restarts when StatefulSet current and update revisions match, indicating no actual spec changes occurred. Rolling restarts now only happen when there are legitimate configuration updates requiring pod recreation. Changes: - Add revision match checks in RollingRestartReconciler.Reconcile() - Enhance WorkingPodForRollingRestart() to skip when revisions match - Add debug logging to indicate when rolling restarts are skipped Fixes issues during node upgrades and other node draining scenarios where pods are recreated without spec changes. Signed-off-by: Itamar Syn-Hershko <[email protected]>

josedev-union · 2025-10-06T09:27:51Z

opensearch-operator/pkg/helpers/helpers.go

+	if sts.Status.UpdateRevision != "" && sts.Status.CurrentRevision == sts.Status.UpdateRevision {
+		return "", errors.New("no rolling restart needed - StatefulSet revisions match")
+	}


This error is bubbled up till the reconcile func. It means this error triggers reconcile requeue.
I think we don't need to requeue.
Now this func doesn't have response case with nil error and empty pod name. Therefore, the caller parts of this func just check if err==nil and not check working pod name.

I'd rather set the condition check for working pod name separately. Then we can return nil error for this non-revision change case with empty working pod name.
@synhershko what do you think?

Sounds good to me - so what is your proposed behavior for node draining?

I made a commit to avoid requeue by returning nil for non-actionable states.

Btw, during the code change and digging the RollingRestartReconciler deeper, I had a doubt if this change is really needed.

As you can see these current code snippets, there are condition checks to avoid rollout for such a draining case.

opensearch-k8s-operator/opensearch-operator/pkg/reconcilers/rollingRestart.go

Lines 152 to 157 in ff376b2

// Only restart pods if not all pods are updated and the sts is healthy with no pods terminating

if sts.Status.ReadyReplicas == ptr.Deref(sts.Spec.Replicas, 1) {

if numReadyPods, err := helpers.CountRunningPodsForNodePool(r.client, r.instance, &nodePool); err == nil && numReadyPods == int(ptr.Deref(sts.Spec.Replicas, 1)) {

lg.Info(fmt.Sprintf("Starting rolling restart of the StatefulSet %s", sts.Name))

return r.restartStatefulSetPod(&sts)

}

opensearch-k8s-operator/opensearch-operator/pkg/helpers/helpers.go

Lines 534 to 536 in ff376b2

if podRevision != sts.Status.UpdateRevision {

return &pod, nil

}

I deployed 3-agent k3s cluster on my local lab and tried to reproduce #312. But i couldn't reproduce it.

deploy os cluster with a 3-replica nodepool with master and data role

drain one k8s node where -2 os replica runs

-2 replica rescheduled on other node and other replicas keep running without restart

I can see that the original issue report was made with version 2.2.1 which is too old.
From the above tests and code check, I can say this issue resolved already.

@synhershko what do you thnk? do you experience this issue still in your clusters? if so, which version do you use?

It didn't happen to me, and wasn't able to reproduce it on our test harness, but the issue seems legit. Maybe it's some sort of a race condition.

Signed-off-by: josedev-union <[email protected]>

synhershko requested review from idanl21, jochenkressin, pchmielnik, prudhvigodithi, salyh and swoehrl-mw as code owners September 11, 2025 02:44

github-project-automation bot added this to Engineering Effectiveness Board Sep 11, 2025

synhershko mentioned this pull request Sep 11, 2025

Operator terminates nodes via restart after k8s node removed via spot request termination. #312

Open

josedev-union reviewed Oct 6, 2025

View reviewed changes

synhershko assigned prudhvigodithi Oct 15, 2025

avoid requeue when no rolling restart is needed

7397951

Signed-off-by: josedev-union <[email protected]>

josedev-union force-pushed the fix/prevent-unnecessary-rolling-restarts-on-node-drain branch from 4f95b71 to 7397951 Compare October 23, 2025 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent unnecessary rolling restarts during node draining #1092

fix: prevent unnecessary rolling restarts during node draining #1092

synhershko commented Sep 11, 2025

Uh oh!

josedev-union Oct 6, 2025

Uh oh!

synhershko Oct 6, 2025

Uh oh!

josedev-union Oct 7, 2025

Uh oh!

synhershko Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// Only restart pods if not all pods are updated and the sts is healthy with no pods terminating
	if sts.Status.ReadyReplicas == ptr.Deref(sts.Spec.Replicas, 1) {
	if numReadyPods, err := helpers.CountRunningPodsForNodePool(r.client, r.instance, &nodePool); err == nil && numReadyPods == int(ptr.Deref(sts.Spec.Replicas, 1)) {
	lg.Info(fmt.Sprintf("Starting rolling restart of the StatefulSet %s", sts.Name))
	return r.restartStatefulSetPod(&sts)
	}

	if podRevision != sts.Status.UpdateRevision {
	return &pod, nil
	}

fix: prevent unnecessary rolling restarts during node draining #1092

Are you sure you want to change the base?

fix: prevent unnecessary rolling restarts during node draining #1092

Conversation

synhershko commented Sep 11, 2025

Uh oh!

josedev-union Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

synhershko Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

josedev-union Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

synhershko Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants