Skip to content

Conversation

@josedev-union
Copy link
Contributor

Description

Quorum-safe rolling restarts across nodepools

  • implement global candidate selection for restart beyond one sts scope
  • enforce one deletion per reconcile (maxUnavailable=1)
  • guarantee one master at a time with cluster-wide quorum checks
  • keep role-aware path as fallback if no global candidate selected

Issues Resolved

#650
#738

Check List

  • Commits are signed per the DCO using --signoff
  • Unittest added for the new/changed functionality and all unit tests are successful
  • Customer-visible features documented
  • No linter warnings (make lint)

If CRDs are changed:

  • CRD YAMLs updated (make manifests) and also copied into the helm chart
  • Changes to CRDs documented

Please refer to the PR guidelines before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@josedev-union
Copy link
Contributor Author

@synhershko @prudhvigodithi
The story is more complex than the initial thought.
Rolling restart is interferred by clusterReconciler which has nodepool recovery logic causing STS recreation.

This PR is WIP but i'd welcome your feedback

- implement global candidate selection for restart beyond one sts scope
- enforce one deletion per reconcile (maxUnavailable=1)
- guarantee one master at a time with cluster-wide quorum checks
- keep role-aware path as fallback if no global candidate selected

Signed-off-by: josedev-union <[email protected]>
@josedev-union josedev-union marked this pull request as ready for review October 23, 2025 14:42
Copy link
Collaborator

@synhershko synhershko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design doc needs updating

Also I'm missing additional safety changes - for example the following at least need to be discussed:

  1. Do we want to run prechecks before the rolling restart starts, and what should they be? eg do we allow non green clusters to restart? do we require replicas to all indices?
  2. Do we want to run any checks between node or group restart? eg do we want cluster stabilized and becoming green before proceeding to the next node or group?
  3. What are failures scenarios in which we stop? what are the options we are going to have to rollback or rescue from failed upgrades?

@@ -0,0 +1,248 @@
# Rolling Restart Improvements for Multi-AZ Master Nodes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file describes the issues with the old implementation and suggestions on the new design. Since it's going to be written to docs/designs it really needs to describe the new design without referencing the old way or any existing issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 👀 In Review

Development

Successfully merging this pull request may close these issues.

2 participants