-
Notifications
You must be signed in to change notification settings - Fork 250
feat: child object watch and resource reconciliation based on informers #611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: child object watch and resource reconciliation based on informers #611
Conversation
* watchset package to implement watches
* Abstracts changes away from kro controllers
* Reusable package that can be used for other controllers
* Can be upstreamed to k8s ecosystem at some point.
* watchset implementation
* watchset sets up watches per GVK and uses a map[id]callbacks to
demux event object to parent object.
* When an event is detected (for a GVK), check callback is called to
see if it matches a specific resource. If it matches, trigger
callback is invoked to generate a parent (instance) event for the
reconciler.
* Uses a dynamic interface that intercepts reconciler calls on behalf
of a parent and added the resources to a watchset.
* watchset manager interface is used by dynamic interface to register
handlers for an object.
* objectWatchManager - sets up callback per object-parent pair.
* labelWatchManager - sets up callback per parent based on labels.
* Changes in kro controller
* Create watchset in RGD reconciler with a label selector
* Use watchset dynamic client for managing resources in instance
reconciler
* During reconcile setup watchManager using object and a trigger
callback to Enqueue the parent to dynamic controller.
* examples/kubernetes
* Add a simple-deployment example for testing. The current webapp does
not successfully reconcile in a kind cluster.
* Makefile changes to help with e2e test locally.
a7512ae to
e499afe
Compare
Co-authored-by: Fabian Burth <[email protected]>
e499afe to
94eafd1
Compare
# Conflicts: # Makefile # pkg/controller/resourcegraphdefinition/controller_reconcile.go # pkg/dynamiccontroller/dynamic_controller.go
Co-authored-by: Fabian Burth <[email protected]>
6b28826 to
41738d0
Compare
|
@a-hilaly, we found out why the tests were failing and fixed that up! |
|
solves #323 |
|
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakobmoellerdev Solid work - thanks a lot! I really like the informer based implementation - it feel to like the right architectural choice. +1 for the feature flag as annotation although i'm down to make it a spec field considering other things stated below.
I've dropped a few comments below - got some thoughts on the startup reconciliation behavior and wondering if we could potentially share informers between RGDs to reduce API server load. Curious to hear your thoughts on these, but overall this is heading in a great direction!
- Updated `Set` to include a metadata client and added corresponding methods. - Switched `DynamicController` to use `metadata.Interface` over `dynamic.Interface`. - Updated tests to utilize `metadata` fake clients. - Removed unused dynamic client utilities and transformations. Signed-off-by: Jakob Möller <[email protected]>
…e-watchset-with-informers
Signed-off-by: Jakob Möller <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jakobmoellerdev The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…on policy - Introduced `instancePolicy` in `ReconcileSpec` to specify periodic or reactive reconciliation for instances. - Removed `instance-watch-resources` label and its associated logic. - Updated controller logic to utilize `instancePolicy` for configuring resource watchers dynamically. - Adjusted integration tests to validate behavior based on `instancePolicy`. - Streamlined event handler registration in `DynamicController`. Signed-off-by: Jakob Möller <[email protected]>
…th-informers Signed-off-by: Jakob Möller <[email protected]> # Conflicts: # test/integration/environment/setup.go # test/integration/suites/core/crd_test.go # test/integration/suites/core/setup_test.go
|
/hold I believe I spotted a bug on restart that is not properly handling watches. This might be due to the factory handler. have to investigate some more |
… management Signed-off-by: Jakob Möller <[email protected]>
7faa49b to
cd1e97f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool stuff, thank you @jakobmoellerdev! left a few comments in line
…th-informers Signed-off-by: Jakob Möller <[email protected]> # Conflicts: # api/v1alpha1/resourcegraphdefinition_types.go # cmd/controller/main.go # config/crd/bases/kro.run_resourcegraphdefinitions.yaml # helm/crds/kro.run_resourcegraphdefinitions.yaml # pkg/controller/resourcegraphdefinition/controller_reconcile.go # pkg/dynamiccontroller/dynamic_controller.go # pkg/dynamiccontroller/dynamic_controller_test.go
…proved lifecycle management - Added logger integration to LazyInformer for structured error reporting. - Introduced context reset on LazyInformer to facilitate re-initialization after shutdown. - Streamlined event handler lifecycle management in DynamicController. - Replaced `NamespacedKey` with `NamespacedName` in `ObjectIdentifiers` for consistency. Signed-off-by: Jakob Möller <[email protected]>
…on clarity - Set default `dynamic-controller-default-resync-period` to 0, effectively disabling periodic resyncs. - Enhanced comments and struct documentation for better readability and maintainability. - Updated integration test environment to reflect the disabled resync configuration. Signed-off-by: Jakob Möller <[email protected]>
200f29c to
cc988bb
Compare
Signed-off-by: Jakob Möller <[email protected]>
- Introduced `QueueShutdownTimeout` to DynamicController configuration for clean shutdown handling. - Adjusted shutdown logic to respect the configured timeout. - Improved code clarity in shutdown coordination. Signed-off-by: Jakob Möller <[email protected]>
…th-informers # Conflicts: # pkg/client/set.go
Signed-off-by: Jakob Möller <[email protected]>
Updated `TestStatus` kind and schema definitions to append "Reactive" or "Periodic" suffix based on the reactive behavior, improving test clarity and flexibility in ConfigMap watch behavior. Signed-off-by: Jakob Möller <[email protected]>
| // +kubebuilder:validation:Optional | ||
| // +kubebuilder:default:=Periodic | ||
| // +kubebuilder:validation:Enum=Periodic;Reactive | ||
| InstancePolicy ResourceGraphDefinitionInstancePolicy `json:"instancePolicy,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this shouldn't be exposed.
There are at least 2 cases which makes the current behaviour unpredictable. Either when the KRO pod is restarted or when the RGD is updated. Both cases trigger a reconciliation of each instance and hence override any changes on the children.
Making KRO always reactive helps bring predictability in the system as any manual change on the children would be overridden almost immediately, just as soon as the child is updated.
With this, I would expect the person doing the manual change to wonder why the manual changes were overridden rather than taking the manual change for done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk about this. I think this policy is not about the instance itself but resources it manages. Periodic behaves more like traditional gitops systems such as flux/argo with a resync interval, whereas reactive is based on dynamic watches purely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still i agree that default should probably be reactive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Periodic behaves more like traditional gitops systems such as flux/argo with a resync interval, whereas reactive is based on dynamic watches purely.
I get this. but I their context is different. gitops handles external sources and hardly have a way to get notified when the source changes. That's the main reason why they have to go for resyncs.
I may be mistaking, but I don't think this is a problem we have with KRO. In the KRO case, all resources are k8s resources and should be watchable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see reconciliation based on child objects.
I think the code can get easier to reason about, specially removing many locks and hence risks of deadlocks, but the feature is worth giving it a try
- Refactor `Register` method to use variadic parameters for `resourceGVRsToWatch`. - Add `meta.RESTMapper` to `DynamicController` for precise `GroupVersionKind` resolution. - Update integration and unit tests to incorporate `RESTMapper` usage. - Introduce comprehensive tests for lazy informer and dynamic controller behaviors.
|
/unhold Probably only thing left is if we want the spec extension or not. Considering what I received as feedback and our choice on applysets, I will remove the spec field again. |
Watch child objects and trigger instance reconciliation on change.
Alternative implementation proposal draft as discussed in KRO community call (23. Jul)
Compares to #576 by:
using the native factory (as of now) for handler registration trackingthe shared informer factory does not allow for efficient dynamic shutdown and restart of informersbased on reviews: