-
Notifications
You must be signed in to change notification settings - Fork 452
Don't block fsnotify watcher during stack application #6420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The PR is marked as stale since no activity has been recorded in 30 days |
7588f68 to
2656726
Compare
jaitaiwan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nit, but the rest LGTM
pkg/applier/stackapplier.go
Outdated
| defer close(trigger) | ||
| defer func() { | ||
| if err := watcher.Close(); err != nil { | ||
| s.log.WithError(err).Error("Failed to close watcher") | ||
| } | ||
| s.log.WithError(err).Error("Error while watching stack") | ||
| } | ||
| }() | ||
| err = s.runWatcher(watcher, trigger, ctx.Done()) | ||
| }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part seems like unnecessary complication; s.log is available inside s.runWatcher so why not just do the defer and log the error inside s.runWatcher and just go s.runWatcher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Maybe I felt that the function which created the watcher should close it, instead of delegating the ownership to runWatcher. Anyhow, I've pushed the defers down the line. PTAL!
2656726 to
31af8c4
Compare
pkg/applier/stackapplier.go
Outdated
| } | ||
| s.log.WithError(err).Error("Error while watching stack") | ||
| } | ||
| err = s.runWatcher(watcher, trigger, ctx.Done()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think err = is needed now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, if we want to forward any errors that might occur when running the watcher to the caller. But: we have to close the trigger channel after assigning to err here. Therefore, at least that defer must be pulled one level up from runWatcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did a bit of pontificating, I think this should still pass the unit test but will make it a bit clearer what's happening with errors:
// Run executes the initial apply and watches the stack for updates.
func (s *StackApplier) Run(ctx context.Context) error {
if ctx.Err() != nil {
return nil // The context is already done.
}
ctx, cancel := context.WithCancel(ctx)
defer cancel()
watcher, err := fsnotify.NewWatcher()
if err != nil {
return fmt.Errorf("failed to create watcher: %w", err)
}
errCh := make(chan error)
go s.runWatcher(watcher, errCh, ctx)
if addErr := watcher.Add(s.path); addErr != nil {
return fmt.Errorf("failed to watch %q: %w", s.path, addErr)
}
for err := range errCh {
return err
}
return nil
}
func (s *StackApplier) runWatcher(watcher *fsnotify.Watcher, errCh chan error, ctx context.Context) (err error) {
defer func() { errCh <- errors.Join(err, watcher.Close()) }()
const timeout = 1 * time.Second // debounce events for one second
timer := time.NewTimer(timeout)
defer timer.Stop()
for {
select {
case err := <-watcher.Errors:
return fmt.Errorf("while watching stack: %w", err)
case event := <-watcher.Events:
// Only consider events on manifest files
if match, _ := filepath.Match(manifestFilePattern, filepath.Base(event.Name)); !match {
continue
}
timer.Reset(timeout)
case <-timer.C:
s.apply(ctx)
case <-ctx.Done():
return nil
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the errCh would need to be buffered and have it closed in the defer of the runWatcher function though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main purpose of this PR is to decouple the stack application from the watcher loop, to ensure that long-running stack applications do not create backpressure into the kernel inotify event queue. While I agree your proposed change is more straight forward, it's quite important to not block the event consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I went a bit far in my simplification. I think if you used the channel approach but kept the triggers that would make the non-obvious coupling between when to close the triggers and the err = being populated by the time the program returns.
31af8c4 to
61bd74a
Compare
Decouple the stack application from the fsnotify watcher loop. This prevents the loop from stalling due to a long-running application and reduces the risk of the operating system buffer getting overflowed. Trigger applications via a separate channel with a buffer size of one item. This naturally implements debouncing. Fixes a panic that could occur when the Run method exits before the initial apply event has been sent from the separate goroutine. Signed-off-by: Tom Wieczorek <[email protected]>
61bd74a to
5231740
Compare
jaitaiwan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Decouple the stack application from the fsnotify watcher loop. This prevents the loop from stalling due to a long-running application and reduces the risk of the operating system buffer getting overflowed. Trigger applications via a separate channel with a buffer size of one item. This naturally implements debouncing.
Fixes:
Type of change
How Has This Been Tested?
Checklist