-
Notifications
You must be signed in to change notification settings - Fork 452
Bail out after stop timeout when stopping supervised processes #6312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ed12b89 to
12a506e
Compare
|
The PR is marked as stale since no activity has been recorded in 30 days |
110ae95 to
a0ddd30
Compare
|
This pull request has merge conflicts that need to be resolved. |
Signed-off-by: Tom Wieczorek <[email protected]>
The os.Process API is strange in that it returns an error instead of (ProcessState, error). This makes it difficult to distinguish between "regular" process errors and failures that occur while actually waiting on the process. Nevertheless, try to distinguish between these two cases to produce more accurate log messages: If the error is nil or unwraps into an *exec.ExitErr, treat it as a "regular" process error. Consider everything else an error indicating a problem with waiting. Signed-off-by: Tom Wieczorek <[email protected]>
Previously, the shutdown code looped endlessly until the child process finished, requesting graceful termination over and over again. Change this to a single request-termination -> wait -> bail-out logic. This is to ensure that k0s won't hang when the supervised processes can't be terminated for whichever reason: the code will terminate, at least after the timeout expired. Use a buffered channel for the wait result, so that the goroutine will be able to exit, even if nothing reads from the channel anymore. Introduce fine-grained error reporting to differentiate shutdown outcomes (graceful shutdown, forced kill, failure, and so on). Signed-off-by: Tom Wieczorek <[email protected]>
a0ddd30 to
0827218
Compare
|
Oh, this PR uncovered an oversight when implementing #6429. The |
|
The fix for the |
Description
Previously, the shutdown code looped endlessly until the child process finished, requesting graceful termination over and over again. Change this to a single request-termination -> wait -> bail-out logic. This is to ensure that k0s won't hang when the supervised processes can't be terminated for whichever reason: the code will terminate, at least after the timeout expired.
Use a buffered channel for the wait result, so that the goroutine will be able to exit, even if nothing reads from the channel anymore. Introduce fine-grained error reporting to differentiate shutdown outcomes (graceful shutdown, forced kill, failure, and so on).
See:
Type of change
How Has This Been Tested?
Checklist