Skip to content

Conversation

henrybear327
Copy link
Contributor

We should let the ClientSet handle the client closure.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@henrybear327 henrybear327 self-assigned this Sep 19, 2025
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: henrybear327
Once this PR has been reviewed and has the lgtm label, please assign jmhbnz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@henrybear327
Copy link
Contributor Author

/retest

Copy link

codecov bot commented Sep 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.11%. Comparing base (4601818) to head (67c992b).

⚠️ Current head 67c992b differs from pull request most recent head c77e1b7

Please upload reports for the commit c77e1b7 to get more accurate results.

Additional details and impacted files

see 21 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #20689      +/-   ##
==========================================
- Coverage   69.14%   69.11%   -0.03%     
==========================================
  Files         420      420              
  Lines       34817    34794      -23     
==========================================
- Hits        24074    24049      -25     
- Misses       9338     9344       +6     
+ Partials     1405     1401       -4     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4601818...c77e1b7. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@henrybear327
Copy link
Contributor Author

/retest

if err != nil {
return err
}
defer c.Close()
Copy link
Member

@serathius serathius Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Close is called correctly. The ClientSet is implemented wrongly.

User stories:

  • I want to ensure that client connections are cleaned up even if panic occurs. defer c.Close() should work.
  • I want reports generated by c.Report() to be complete. Client should be closed when generating report to prevent future changes.
  • Client.Close should not panic/error when I close it after getting Report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok.

Let me look into it. Thanks for the input.

Copy link
Contributor Author

@henrybear327 henrybear327 Sep 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the root cause is now clear.

The way we implement openWatchPeriodically is that we will take the RecordingClient as a parameter, and continuously spawn watches in the go routines.

In openWatchPeriodically, when we break due to context is done or we receive something from the finish channel, we will exit the function. But the go routines that we have created are still executing, so there would be a possibility that the shared RecordingClient is closed outside of openWatchPeriodically (due to defer), but the Get (or whatever calls) from within the goroutine is still being made (before the go routine reaches its exit checkpoint), thus, causing the grpc the client connection is closing error, as we might be using closed RecordingClient!

The fix I have put up so far is not ... pretty though. Basically, locking the RecordingClient with the kvMux for all actions. Maybe there is a better idea here :( @serathius (code here is PoC)

@henrybear327 henrybear327 force-pushed the robustness-test-fix-watch-rpc-bug branch from 336739c to ee854c0 Compare September 21, 2025 18:43
@henrybear327 henrybear327 force-pushed the robustness-test-fix-watch-rpc-bug branch 2 times, most recently from 478d9e1 to 4bbfb83 Compare September 21, 2025 19:09
@henrybear327 henrybear327 marked this pull request as draft September 21, 2025 19:32
@henrybear327 henrybear327 force-pushed the robustness-test-fix-watch-rpc-bug branch from 4bbfb83 to deda3df Compare September 21, 2025 21:03
@henrybear327 henrybear327 force-pushed the robustness-test-fix-watch-rpc-bug branch from deda3df to 4c25b7d Compare September 21, 2025 21:06
@henrybear327 henrybear327 changed the title Fix an issue in robustness test that watch created in the ClientSet is closed wrongly Fix the client connection is closing issue in robustness test Sep 21, 2025
@henrybear327 henrybear327 force-pushed the robustness-test-fix-watch-rpc-bug branch 3 times, most recently from c5b3be5 to 67c992b Compare September 21, 2025 21:50
@henrybear327 henrybear327 marked this pull request as ready for review September 22, 2025 08:11
@k8s-ci-robot
Copy link

@henrybear327: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-e2e-amd64 67c992b link true /test pull-etcd-e2e-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

}
watch := c.Watch(ctx, "", lastRevision+1, true, true, false)
if watch == nil {
return nil
Copy link
Member

@serathius serathius Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. Please read how this loop exists.

return c.client.Endpoints()
}

func (c *RecordingClient) Watch(ctx context.Context, key string, rev int64, withPrefix bool, withProgressNotify bool, withPrevKV bool) clientv3.WatchChan {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return error

c.kvMux.Lock()
defer c.kvMux.Unlock()
if c.isClosed {
return nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong too.

c.kvMux.Lock()
defer c.kvMux.Unlock()
if c.isClosed {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't return nil as it breaks contract. Instead implement interface that returns error when we execute Txn.

@henrybear327 henrybear327 force-pushed the robustness-test-fix-watch-rpc-bug branch from 67c992b to c77e1b7 Compare September 26, 2025 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants