Skip to content

Conversation

ballista01
Copy link
Contributor

@ballista01 ballista01 commented Jul 16, 2025

Summary

  • Refactors EtcdClusterReconciler.Reconcile into focused phases to reduce cyclomatic complexity and make behavior easier to reason about.
  • Adds a reconcileState container so data gathered in earlier phases can be reused without repeated lookups.
  • Reworks the controller unit tests to target each phase individually with fake clients instead of the previous envtest-based flow.

Controller changes (internal/controller/etcdcluster_controller.go)

  • Split the main reconcile loop into:

    • fetchAndValidateState
    • bootstrapStatefulSet
    • performHealthChecks
    • reconcileClusterState

    Each returns early with a ctrl.Result when it needs a requeue or encounters an error.

  • Introduced reconcileState (cluster, StatefulSet, member list, member health) to pass context between phases.

  • fetchAndValidateState:

    • Sets a default image registry.
    • Prepares TLS credentials.
    • Verifies StatefulSet ownership.
    • Signals requeues when dependent resources aren’t yet ready.
  • bootstrapStatefulSet:

    • Unifies StatefulSet creation/initial scaling and the headless service check.
    • Uses ctrl.Result.IsZero() to decide whether to continue.
  • performHealthChecks and reconcileClusterState:

    • Now operate on memberHealth.
    • Added comments clarifying learner promotion, scale-out/-in, and health gating.

Tests (internal/controller/etcdcluster_controller_test.go)

  • Replaced the previous integration-style test with table-driven unit tests using controller-runtime’s fake client.

  • Added TestFetchAndValidateState and TestBootstrapStatefulSet, switching to require for hard preconditions and asserting on StatefulSet/Service/ConfigMap outcomes.

  • Expanded coverage for bootstrap scenarios:

    • Initial creation.
    • 0→1 promotion.
    • No-op when resources exist.
  • Verified cluster state remains untouched when nothing changes.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ballista01
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @ballista01. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jberkus
Copy link
Contributor

jberkus commented Jul 22, 2025

/ok-to-test

- split the monolithic Reconcile into fetch/sync/health/cluster phases backed by a shared reconcileState

- ensure TLS setup, STS ownership checks, and primitive bootstrapping are driven through dedicated helpers

- refresh controller unit tests to exercise the new helpers and drop obsolete scaffolding

Signed-off-by: Wenxue Zhao <[email protected]>
@ivanvc
Copy link
Member

ivanvc commented Sep 30, 2025

I was reading the code. But the changes look a little bit lengthy. Do you think that it would be possible to break down this pull request?

@ballista01 ballista01 changed the title feat(controller, e2e): Refactor reconcile loop and add E2E tests for scaling and promotion feat(controller): Refactor reconcile loop and add unit tests Sep 30, 2025
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring makes sense to me. Just raised several minor comments.

Thanks.

cc @ivanvc @ArkaSaha30 @abdurrehman107 @hakman @frederiko PTAL when you get free cycle, thx

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request, @ballista01. I left a couple of comments.

I also noticed that you removed some comments from lengthy functions. I think it makes sense to comment these functions, as it is easy to get lost with all the conditions they have. Thanks again :)

cluster *ecv1alpha1.EtcdCluster // cluster custom resource currently being reconciled
sts *appsv1.StatefulSet // associated StatefulSet for the cluster
memberListResp *clientv3.MemberListResponse // member list fetched from the etcd cluster
healthInfos []etcdutils.EpHealth // health information for each etcd member
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this property name out loud makes me wonder if we could rename it to: memberHealth

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also recovered and updated some comments (mainly in reconcileClusterState). Let me know if we need more!

},
req: ctrl.Request{NamespacedName: types.NamespacedName{Name: "etcd", Namespace: "default"}},
assert: func(t *testing.T, state *reconcileState, res ctrl.Result, err error, ec *ecv1alpha1.EtcdCluster, _ *appsv1.StatefulSet) {
if assert.NotNil(t, state) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of this if statement? If it fails, either way the test will fail in the end. So, I'm trying to understand why we would want this behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question! testify’s assert helpers only record the failure and carry on, so the original if assert.NotNil(...) blocks were just there to stop us from dereferencing state/state.sts and panicing. That might seem to be confusing, so I’ve switched those checks to require.NotNil(...). Now the tests fail fast when the state is missing, and the following assert.Equal calls simply verify the expected fields.

Comment on lines 107 to 111
if assert.NotNil(t, state) {
assert.Equal(t, ec.Name, state.cluster.Name)
if assert.NotNil(t, state.sts) {
assert.Equal(t, sts.Name, state.sts.Name)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the usage to require.NotNil(...)

@ahrtr
Copy link
Member

ahrtr commented Oct 3, 2025

The change for internal/controller/etcdcluster_controller.go looks clean and better. Thanks. (But do not get much time to look into the test case.)

Let's target to merge this PR before our next WG meeting.

@ahrtr
Copy link
Member

ahrtr commented Oct 8, 2025

@ivanvc any further comments on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants