Skip to content

NETOBSERV-2225 - Deploy static plugin at operator startup #1345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Apr 2, 2025

Description

Create the console plugin when FlowCollector doesn't exists to expose the new forms.

Suggested alternatives: #1346 & #1374

See netobserv/network-observability-console-plugin#763 for the forms implementations

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Copy link

openshift-ci bot commented Apr 2, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jotak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines 81 to 100
// force reconcile at startup
go r.InitReconcile(ctx)

return nil
}

func (r *FlowCollectorReconciler) InitReconcile(ctx context.Context) error {
log := log.FromContext(ctx)
log.Info("Initializing resources...")

var err error
for attempt := range initReconcileAttempts {
// delay the reconcile calls to let some time to the cache to load
time.Sleep(5 * time.Second)
_, err = r.Reconcile(ctx, reconcile.Request{})
if err != nil {
log.Error(err, "Error while doing initial reconcile", "attempt", attempt)
} else {
break
}
}
return err
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ I wonder if there is an out of box mechanism to trigger the loop after the cache loaded. That's why I'm using a sleep here and this will may work in all situations.

https://redhat-internal.slack.com/archives/C02939DP5L5/p1743518264445429

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, when a reconcile loop is failing, you can also return a time value to reschedule the reconciliation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I gave a try with that without success.

I'm refactoring the code again to move the static content to another controller wich will be cleaner I guess. I will give another try with the reschedule time on the new controller 👍

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 14, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 14, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 14, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 14, 2025
@jpinsonneau jpinsonneau force-pushed the 1942 branch 2 times, most recently from c4c57e1 to dbc4cbf Compare April 15, 2025 10:40
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 17, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 30, 2025
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:40740e0
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-40740e0
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-40740e0

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:40740e0 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-40740e0

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-40740e0
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@memodi
Copy link
Member

memodi commented May 5, 2025

/test e2e-operator

@memodi
Copy link
Member

memodi commented May 5, 2025

@jpinsonneau there seems to be an issue where plugin does become ready, despite pod is in Running state and no issues with Loki whatsoever:

time="2025-05-05T20:00:34Z" level=error msg="cannot unmarshal, response was: Ingester not ready: waiting for 15s after being ready\n" error="invalid character 'I' looking for beginning of value" module=handler
time="2025-05-05T20:00:34Z" level=error msg="cannot unmarshal, response was: Ingester not ready: waiting for 15s after being ready\n" error="invalid character 'I' looking for beginning of value" module=handler

image

the static plugin pod had this error:

$ oc -n openshift-netobserv-operator logs pod/netobserv-plugin-static-7c97f478bf-t7gj8
time="2025-05-05T20:10:16Z" level=info msg="Starting netobserv-console-plugin [build version: main-2c3b33b, build date: 2025-05-05 10:42] at log level info" module=main
time="2025-05-05T20:10:16Z" level=fatal msg="invalid config" error="neither Loki nor Prometheus is configured; at least one of them should have a URL defined" module=main

I had monolithic loki configured in my flowcollector - anything I am missing for static console plugin config?

@jpinsonneau
Copy link
Contributor Author

@jpinsonneau there seems to be an issue where plugin does become ready, despite pod is in Running state and no issues with Loki whatsoever:

time="2025-05-05T20:00:34Z" level=error msg="cannot unmarshal, response was: Ingester not ready: waiting for 15s after being ready\n" error="invalid character 'I' looking for beginning of value" module=handler
time="2025-05-05T20:00:34Z" level=error msg="cannot unmarshal, response was: Ingester not ready: waiting for 15s after being ready\n" error="invalid character 'I' looking for beginning of value" module=handler

image

the static plugin pod had this error:

$ oc -n openshift-netobserv-operator logs pod/netobserv-plugin-static-7c97f478bf-t7gj8
time="2025-05-05T20:10:16Z" level=info msg="Starting netobserv-console-plugin [build version: main-2c3b33b, build date: 2025-05-05 10:42] at log level info" module=main
time="2025-05-05T20:10:16Z" level=fatal msg="invalid config" error="neither Loki nor Prometheus is configured; at least one of them should have a URL defined" module=main

I had monolithic loki configured in my flowcollector - anything I am missing for static console plugin config?

@memodi are you using the proper plugin image mentionned above ?
netobserv/network-observability-console-plugin#763

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 6, 2025
@memodi
Copy link
Member

memodi commented May 6, 2025

@memodi are you using the proper plugin image mentionned above ?
netobserv/network-observability-console-plugin#763

I was not, I didn't realize that's dependency. I'll try with it today. Thanks!

@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 6, 2025
@netobserv netobserv deleted a comment from github-actions bot May 6, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 6, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 6, 2025
Copy link

github-actions bot commented May 6, 2025

New images:

  • quay.io/netobserv/network-observability-operator:1c54f0e
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-1c54f0e
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-1c54f0e

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:1c54f0e make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-1c54f0e

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-1c54f0e
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@jpinsonneau
Copy link
Contributor Author

@memodi FYI I did some changes to address #1345 (comment) comment

Just tested and the behavior remains the same 😉
It avoid using an unecessary service account and role

@jpinsonneau jpinsonneau added the needs-review Tells that the PR needs a review label May 13, 2025
@memodi
Copy link
Member

memodi commented May 13, 2025

/jira NETOBSERV-2225

log := log.FromContext(ctx)
log.Info("Initializing resources...")

for attempt := range initReconcileAttempts {
Copy link
Member

@jotak jotak May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, what happens if all 5 attempts fail? The static plugin wouldn't deploy, but the rest would work normally? Or does it make the controller CLBO or so ? (my understanding is it's the first, but just want to make sure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the attempts fails, the static plugin will not be there at controller startup.
As soon as a reconcile loop is triggered, it will appears (ie creating a FlowCollector for example)


r.status.SetUnknown()
defer r.status.Commit(ctx, r.Client)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After setting "Unknown", I think this controller should return if openshift isn't detected, right?
We could check it in Kind

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's done in the static reconciler using HasConsolePlugin function:

func (r *CPReconciler) reconcileStatic(ctx context.Context, desired *flowslatest.FlowCollector) error {
l := log.FromContext(ctx).WithName("console-plugin")
ctx = log.IntoContext(ctx, l)
// Retrieve current owned objects
err := r.Managed.FetchAll(ctx)
if err != nil {
return err
}
if r.ClusterInfo.HasConsolePlugin() {
if err = r.checkAutoPatch(ctx, desired, constants.StaticPluginName); err != nil {
return err
}
}
if r.ClusterInfo.HasConsolePlugin() {
// Create object builder
builder := newBuilder(r.Instance, &desired.Spec, constants.StaticPluginName)
if err = r.reconcilePlugin(ctx, &builder, &desired.Spec, constants.StaticPluginName, "NetObserv static plugin"); err != nil {
return err
}
if err = r.reconcileDeployment(ctx, &builder, &desired.Spec, constants.StaticPluginName, ""); err != nil {
return err
}
if err = r.reconcileServices(ctx, &builder, constants.StaticPluginName); err != nil {
return err
}
} else {
// delete any existing owned object
r.Managed.TryDeleteAll(ctx)
}
return nil
}

If console plugin is not available, nothing is deployed and the status goes ready

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static controller could deploy something else than the console plugin in future so I think it's better to keeps things separated here. WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, yes sounds good, thanks!

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment when not running on openshift, other than that lgtm

@jpinsonneau
Copy link
Contributor Author

/restest

@jotak
Copy link
Member

jotak commented May 14, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 14, 2025
@jotak jotak removed the needs-review Tells that the PR needs a review label May 14, 2025
Copy link

openshift-ci bot commented Jun 5, 2025

@jpinsonneau: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator 1080b69 link false /test e2e-operator
ci/prow/ci-bundle-noo-bundle 1080b69 link true /test ci-bundle-noo-bundle

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

1 similar comment
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm needs-rebase ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants