Skip to content

Conversation

@twz123
Copy link
Member

@twz123 twz123 commented Sep 22, 2025

Description

Decouple the stack application from the fsnotify watcher loop. This prevents the loop from stalling due to a long-running application and reduces the risk of the operating system buffer getting overflowed. Trigger applications via a separate channel with a buffer size of one item. This naturally implements debouncing.

Fixes:

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

  • Manual test
  • Auto test added

Checklist

  • My code follows the style guidelines of this project
  • My commit messages are signed-off
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

@twz123 twz123 linked an issue Sep 22, 2025 that may be closed by this pull request
@twz123 twz123 marked this pull request as ready for review September 22, 2025 15:02
@twz123 twz123 requested review from a team as code owners September 22, 2025 15:02
@github-actions
Copy link
Contributor

The PR is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Oct 22, 2025
@twz123 twz123 force-pushed the debouncer-separate-apply branch 2 times, most recently from 7588f68 to 2656726 Compare October 23, 2025 20:34
@github-actions github-actions bot removed the Stale label Oct 23, 2025
Copy link

@jaitaiwan jaitaiwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit, but the rest LGTM

Comment on lines 67 to 74
defer close(trigger)
defer func() {
if err := watcher.Close(); err != nil {
s.log.WithError(err).Error("Failed to close watcher")
}
s.log.WithError(err).Error("Error while watching stack")
}
}()
err = s.runWatcher(watcher, trigger, ctx.Done())
}()
Copy link

@jaitaiwan jaitaiwan Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems like unnecessary complication; s.log is available inside s.runWatcher so why not just do the defer and log the error inside s.runWatcher and just go s.runWatcher?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Maybe I felt that the function which created the watcher should close it, instead of delegating the ownership to runWatcher. Anyhow, I've pushed the defers down the line. PTAL!

@twz123 twz123 force-pushed the debouncer-separate-apply branch from 2656726 to 31af8c4 Compare October 31, 2025 16:31
@twz123 twz123 marked this pull request as draft October 31, 2025 16:35
}
s.log.WithError(err).Error("Error while watching stack")
}
err = s.runWatcher(watcher, trigger, ctx.Done())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think err = is needed now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, if we want to forward any errors that might occur when running the watcher to the caller. But: we have to close the trigger channel after assigning to err here. Therefore, at least that defer must be pulled one level up from runWatcher.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did a bit of pontificating, I think this should still pass the unit test but will make it a bit clearer what's happening with errors:

// Run executes the initial apply and watches the stack for updates.
func (s *StackApplier) Run(ctx context.Context) error {
	if ctx.Err() != nil {
		return nil // The context is already done.
	}

	ctx, cancel := context.WithCancel(ctx)
	defer cancel()

	watcher, err := fsnotify.NewWatcher()
	if err != nil {
		return fmt.Errorf("failed to create watcher: %w", err)
	}

	errCh := make(chan error)
	go s.runWatcher(watcher, errCh, ctx)

	if addErr := watcher.Add(s.path); addErr != nil {
		return fmt.Errorf("failed to watch %q: %w", s.path, addErr)
	}

	for err := range errCh {
		return err
	}

	return nil
}

func (s *StackApplier) runWatcher(watcher *fsnotify.Watcher, errCh chan error, ctx context.Context) (err error) {
	defer func() { errCh <- errors.Join(err, watcher.Close()) }()

	const timeout = 1 * time.Second // debounce events for one second
	timer := time.NewTimer(timeout)
	defer timer.Stop()

	for {
		select {
		case err := <-watcher.Errors:
			return fmt.Errorf("while watching stack: %w", err)

		case event := <-watcher.Events:
			// Only consider events on manifest files
			if match, _ := filepath.Match(manifestFilePattern, filepath.Base(event.Name)); !match {
				continue
			}
			timer.Reset(timeout)

		case <-timer.C:
			s.apply(ctx)

		case <-ctx.Done():
			return nil
		}
	}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the errCh would need to be buffered and have it closed in the defer of the runWatcher function though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose of this PR is to decouple the stack application from the watcher loop, to ensure that long-running stack applications do not create backpressure into the kernel inotify event queue. While I agree your proposed change is more straight forward, it's quite important to not block the event consumption.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I went a bit far in my simplification. I think if you used the channel approach but kept the triggers that would make the non-obvious coupling between when to close the triggers and the err = being populated by the time the program returns.

@twz123 twz123 force-pushed the debouncer-separate-apply branch from 31af8c4 to 61bd74a Compare November 2, 2025 16:27
@twz123 twz123 requested a review from jaitaiwan November 2, 2025 16:35
@twz123 twz123 marked this pull request as ready for review November 2, 2025 18:14
Decouple the stack application from the fsnotify watcher loop. This
prevents the loop from stalling due to a long-running application and
reduces the risk of the operating system buffer getting overflowed.
Trigger applications via a separate channel with a buffer size of one
item. This naturally implements debouncing.

Fixes a panic that could occur when the Run method exits before the
initial apply event has been sent from the separate goroutine.

Signed-off-by: Tom Wieczorek <[email protected]>
@twz123 twz123 force-pushed the debouncer-separate-apply branch from 61bd74a to 5231740 Compare November 3, 2025 09:07
Copy link

@jaitaiwan jaitaiwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@twz123 twz123 merged commit 01f910a into k0sproject:main Nov 7, 2025
204 of 206 checks passed
@twz123 twz123 deleted the debouncer-separate-apply branch November 7, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic in applier unit tests: DATA RACE / send on closed channel

3 participants