Don't block fsnotify watcher during stack application #6420

twz123 · 2025-09-22T12:29:42Z

Description

Decouple the stack application from the fsnotify watcher loop. This prevents the loop from stalling due to a long-running application and reduces the risk of the operating system buffer getting overflowed. Trigger applications via a separate channel with a buffer size of one item. This naturally implements debouncing.

Fixes:

Panic in applier unit tests: DATA RACE / send on closed channel #6337

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update

How Has This Been Tested?

Manual test
Auto test added

Checklist

github-actions · 2025-10-22T23:04:24Z

The PR is marked as stale since no activity has been recorded in 30 days

jaitaiwan

One small nit, but the rest LGTM

jaitaiwan · 2025-10-31T00:48:14Z

pkg/applier/stackapplier.go

+		defer close(trigger)
+		defer func() {
+			if err := watcher.Close(); err != nil {
+				s.log.WithError(err).Error("Failed to close watcher")
 			}
-			s.log.WithError(err).Error("Error while watching stack")
-		}
+		}()
+		err = s.runWatcher(watcher, trigger, ctx.Done())
 	}()


This part seems like unnecessary complication; s.log is available inside s.runWatcher so why not just do the defer and log the error inside s.runWatcher and just go s.runWatcher?

Good question. Maybe I felt that the function which created the watcher should close it, instead of delegating the ownership to runWatcher. Anyhow, I've pushed the defers down the line. PTAL!

jaitaiwan · 2025-11-01T00:06:40Z

pkg/applier/stackapplier.go

-			}
-			s.log.WithError(err).Error("Error while watching stack")
-		}
+		err = s.runWatcher(watcher, trigger, ctx.Done())


I don’t think err = is needed now

It is, if we want to forward any errors that might occur when running the watcher to the caller. But: we have to close the trigger channel after assigning to err here. Therefore, at least that defer must be pulled one level up from runWatcher.

So I did a bit of pontificating, I think this should still pass the unit test but will make it a bit clearer what's happening with errors:

// Run executes the initial apply and watches the stack for updates. func (s *StackApplier) Run(ctx context.Context) error { if ctx.Err() != nil { return nil // The context is already done. } ctx, cancel := context.WithCancel(ctx) defer cancel() watcher, err := fsnotify.NewWatcher() if err != nil { return fmt.Errorf("failed to create watcher: %w", err) } errCh := make(chan error) go s.runWatcher(watcher, errCh, ctx) if addErr := watcher.Add(s.path); addErr != nil { return fmt.Errorf("failed to watch %q: %w", s.path, addErr) } for err := range errCh { return err } return nil } func (s *StackApplier) runWatcher(watcher *fsnotify.Watcher, errCh chan error, ctx context.Context) (err error) { defer func() { errCh <- errors.Join(err, watcher.Close()) }() const timeout = 1 * time.Second // debounce events for one second timer := time.NewTimer(timeout) defer timer.Stop() for { select { case err := <-watcher.Errors: return fmt.Errorf("while watching stack: %w", err) case event := <-watcher.Events: // Only consider events on manifest files if match, _ := filepath.Match(manifestFilePattern, filepath.Base(event.Name)); !match { continue } timer.Reset(timeout) case <-timer.C: s.apply(ctx) case <-ctx.Done(): return nil } } }

Not sure if the errCh would need to be buffered and have it closed in the defer of the runWatcher function though.

The main purpose of this PR is to decouple the stack application from the watcher loop, to ensure that long-running stack applications do not create backpressure into the kernel inotify event queue. While I agree your proposed change is more straight forward, it's quite important to not block the event consumption.

That's a good point, I went a bit far in my simplification. I think if you used the channel approach but kept the triggers that would make the non-obvious coupling between when to close the triggers and the err = being populated by the time the program returns.

Decouple the stack application from the fsnotify watcher loop. This prevents the loop from stalling due to a long-running application and reduces the risk of the operating system buffer getting overflowed. Trigger applications via a separate channel with a buffer size of one item. This naturally implements debouncing. Fixes a panic that could occur when the Run method exits before the initial apply event has been sent from the separate goroutine. Signed-off-by: Tom Wieczorek <[email protected]>

jaitaiwan

LGTM

twz123 linked an issue Sep 22, 2025 that may be closed by this pull request

Panic in applier unit tests: DATA RACE / send on closed channel #6337

Closed

twz123 marked this pull request as ready for review September 22, 2025 15:02

twz123 requested review from a team as code owners September 22, 2025 15:02

twz123 requested review from juanluisvaladas and kke September 22, 2025 15:02

github-actions bot added the Stale label Oct 22, 2025

twz123 force-pushed the debouncer-separate-apply branch 2 times, most recently from 7588f68 to 2656726 Compare October 23, 2025 20:34

github-actions bot removed the Stale label Oct 23, 2025

jaitaiwan approved these changes Oct 31, 2025

View reviewed changes

twz123 force-pushed the debouncer-separate-apply branch from 2656726 to 31af8c4 Compare October 31, 2025 16:31

twz123 marked this pull request as draft October 31, 2025 16:35

jaitaiwan reviewed Nov 1, 2025

View reviewed changes

twz123 force-pushed the debouncer-separate-apply branch from 31af8c4 to 61bd74a Compare November 2, 2025 16:27

twz123 requested a review from jaitaiwan November 2, 2025 16:35

twz123 marked this pull request as ready for review November 2, 2025 18:14

twz123 force-pushed the debouncer-separate-apply branch from 61bd74a to 5231740 Compare November 3, 2025 09:07

jaitaiwan approved these changes Nov 3, 2025

View reviewed changes

juanluisvaladas approved these changes Nov 7, 2025

View reviewed changes

twz123 merged commit 01f910a into k0sproject:main Nov 7, 2025
204 of 206 checks passed

twz123 deleted the debouncer-separate-apply branch November 7, 2025 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't block fsnotify watcher during stack application #6420

Don't block fsnotify watcher during stack application #6420

Uh oh!

twz123 commented Sep 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

jaitaiwan left a comment

Uh oh!

jaitaiwan Oct 31, 2025 •

edited

Loading

Uh oh!

twz123 Oct 31, 2025

Uh oh!

jaitaiwan Nov 1, 2025

Uh oh!

twz123 Nov 2, 2025

Uh oh!

jaitaiwan Nov 3, 2025

Uh oh!

jaitaiwan Nov 3, 2025

Uh oh!

twz123 Nov 3, 2025

Uh oh!

jaitaiwan Nov 3, 2025

Uh oh!

jaitaiwan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Don't block fsnotify watcher during stack application #6420

Don't block fsnotify watcher during stack application #6420

Uh oh!

Conversation

twz123 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How Has This Been Tested?

Checklist

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

jaitaiwan left a comment

Choose a reason for hiding this comment

Uh oh!

jaitaiwan Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twz123 Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

jaitaiwan Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

twz123 Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

jaitaiwan Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

jaitaiwan Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

twz123 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

jaitaiwan Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

jaitaiwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

twz123 commented Sep 22, 2025 •

edited

Loading

jaitaiwan Oct 31, 2025 •

edited

Loading