-
Notifications
You must be signed in to change notification settings - Fork 2.8k
refactor(controller): signal handling and shutdown #5798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
refactor(controller): signal handling and shutdown #5798
Conversation
…5150) Signed-off-by: Tobias Harnickell <[email protected]>
Welcome @TobyTheHutt! |
Hi @TobyTheHutt. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Thanks @TobyTheHutt 👍 |
Signed-off-by: Tobias Harnickell <[email protected]>
Signed-off-by: Tobias Harnickell <[email protected]>
/lgtm |
seems like the prow tests are catching some race conditions /lgtm cancel |
I'm not sure how exactly to reproduce exact environment as where the tests executes. The test pull-external-dns-unit-test runs on kubernetes infra outside of our controll. On local machine try to limit resources to 1 or 0.5 CPU it may help, and it runs on Linux OS there. |
@ivankatliarchuk I'm working on it. I wrote a Docker setup to emulate the test pipeline and am able to reproduce the issue. Would the project be interested in the setup? If yes, I'll open an issue for that. It would additionally allow OS-independent testing and makefile runs. |
…5150) Prevent data races and premature process terminations by managing signal capture and synchronization. Signed-off-by: Tobias Harnickell <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Open a PR lets have a look |
I'll clean up the files and create a PR for the Docker setup soon. |
I'm unsure about changes proposed. Let's say I have no much expertise in the area most likely and mainly as it’s not paired with thorough end-to-end(smoke) testing of shutdown paths. This is my understanding, and why I have raised questions to myself about service stability
-T ests improve, but production gets more code paths where cancellation or cleanup could be mishandled. More opportunities for subtle bugs. Why It’s Still an Improvement (in theory and in from my view)
So:
So I leave it for other reviewvers to review and make a decision. |
…t-controller-execute Signed-off-by: Tobias Harnickell <[email protected]>
I gave the feedback some considerations.
It's true that my changes to signal handling and shutdown introduce more components which participate in graceful shutdown. Therefore, more potential points of failure. Also,
Depending on whether we decide to go down this road, I will add more commits to "smoothen" out the edges I introduced with my change:
|
Main risk
And concern that better to avoid
|
@TobyTheHutt I tried to give a better title to this PR, feel free to amend it if I missed something. /lgtm |
Going through my changes I just realised that |
/lgtm cancel |
Signed-off-by: Tobias Harnickell <[email protected]>
/lgtm |
Main concern here:
In kubernetes runtime-controller the OS signals already wired up https://github.com/kubernetes-sigs/controller-runtime/blob/961fc2c233d64e40b5bdf449f12bfa22cf2d7b28/pkg/manager/signals/signal.go#L30 We are quite away from implementing proper kubernetes controller manager. And at the moment we lack strategy in this area |
What does it do ?
Improves test coverage on the controller.
Motivation
The referenced ticket (#5150), aiming to improve code coverage.
More