-
Notifications
You must be signed in to change notification settings - Fork 293
fix: prevent unnecessary rolling restarts during node draining #1092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: prevent unnecessary rolling restarts during node draining #1092
Conversation
When nodes are drained during K8S or OS upgrades, pods get deleted and recreated, but the StatefulSet revision doesn't change. The existing logic incorrectly triggered rolling restarts in these scenarios, causing all pods with lower ordinal numbers to be terminated simultaneously. This change adds revision comparison checks to prevent rolling restarts when StatefulSet current and update revisions match, indicating no actual spec changes occurred. Rolling restarts now only happen when there are legitimate configuration updates requiring pod recreation. Changes: - Add revision match checks in RollingRestartReconciler.Reconcile() - Enhance WorkingPodForRollingRestart() to skip when revisions match - Add debug logging to indicate when rolling restarts are skipped Fixes issues during node upgrades and other node draining scenarios where pods are recreated without spec changes. Signed-off-by: Itamar Syn-Hershko <[email protected]>
| if sts.Status.UpdateRevision != "" && sts.Status.CurrentRevision == sts.Status.UpdateRevision { | ||
| return "", errors.New("no rolling restart needed - StatefulSet revisions match") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is bubbled up till the reconcile func. It means this error triggers reconcile requeue.
I think we don't need to requeue.
Now this func doesn't have response case with nil error and empty pod name. Therefore, the caller parts of this func just check if err==nil and not check working pod name.
I'd rather set the condition check for working pod name separately. Then we can return nil error for this non-revision change case with empty working pod name.
@synhershko what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me - so what is your proposed behavior for node draining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a commit to avoid requeue by returning nil for non-actionable states.
Btw, during the code change and digging the RollingRestartReconciler deeper, I had a doubt if this change is really needed.
As you can see these current code snippets, there are condition checks to avoid rollout for such a draining case.
opensearch-k8s-operator/opensearch-operator/pkg/reconcilers/rollingRestart.go
Lines 152 to 157 in ff376b2
| // Only restart pods if not all pods are updated and the sts is healthy with no pods terminating | |
| if sts.Status.ReadyReplicas == ptr.Deref(sts.Spec.Replicas, 1) { | |
| if numReadyPods, err := helpers.CountRunningPodsForNodePool(r.client, r.instance, &nodePool); err == nil && numReadyPods == int(ptr.Deref(sts.Spec.Replicas, 1)) { | |
| lg.Info(fmt.Sprintf("Starting rolling restart of the StatefulSet %s", sts.Name)) | |
| return r.restartStatefulSetPod(&sts) | |
| } |
opensearch-k8s-operator/opensearch-operator/pkg/helpers/helpers.go
Lines 534 to 536 in ff376b2
| if podRevision != sts.Status.UpdateRevision { | |
| return &pod, nil | |
| } |
I deployed 3-agent k3s cluster on my local lab and tried to reproduce #312. But i couldn't reproduce it.
- deploy os cluster with a 3-replica nodepool with master and data role
- drain one k8s node where
-2os replica runs -2replica rescheduled on other node and other replicas keep running without restart
I can see that the original issue report was made with version 2.2.1 which is too old.
From the above tests and code check, I can say this issue resolved already.
@synhershko what do you thnk? do you experience this issue still in your clusters? if so, which version do you use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't happen to me, and wasn't able to reproduce it on our test harness, but the issue seems legit. Maybe it's some sort of a race condition.
Signed-off-by: josedev-union <[email protected]>
4f95b71 to
7397951
Compare
When nodes are drained during K8S or OS upgrades, pods get deleted and recreated, but the StatefulSet revision doesn't change. The existing logic incorrectly triggered rolling restarts in these scenarios, causing all pods with lower ordinal numbers to be terminated simultaneously.
This change adds revision comparison checks to prevent rolling restarts when StatefulSet current and update revisions match, indicating no actual spec changes occurred. Rolling restarts now only happen when there are legitimate configuration updates requiring pod recreation.
Changes:
Fixes issues during node upgrades and other node draining scenarios where pods are recreated without spec changes. Fixes #312.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.