Skip to content

MCO-1748: Proposed updates to MCN in 4.20 for Status Reporting #1809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

isabella-janssen
Copy link
Member

@isabella-janssen isabella-janssen commented Jun 12, 2025

This outlines the proposed changes to the MachineConfigNode (MCN) resource in 4.20 to finalize "Status Reporting" in the MCO. The primary change for 4.20 is reporting the status through OnClusterLayering-enabled node updates.

Note that since the 4.19 updates to the MCN enhancement have not yet merged (PR #1765), only the most recent commit, dfe00ab, is relevant to 4.20.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 12, 2025

@isabella-janssen: This pull request references MCO-1748 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This outlines the proposed changes to the MachineConfigNode (MCN) resource in 4.20 to finalize "Status Reporting" in the MCO. The primary change for 4.20 is reporting the status through OnClusterLayering-enabled node updates.

Note that since the 4.19 updates to the MCN enhancement have not yet merged (PR #1765), only the most recent commit is relevant to 4.20.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 12, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2025
Copy link
Contributor

openshift-ci bot commented Jun 12, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jun 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jsafrane for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 12, 2025

@isabella-janssen: This pull request references MCO-1748 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This outlines the proposed changes to the MachineConfigNode (MCN) resource in 4.20 to finalize "Status Reporting" in the MCO. The primary change for 4.20 is reporting the status through OnClusterLayering-enabled node updates.

Note that since the 4.19 updates to the MCN enhancement have not yet merged (PR #1765), only the most recent commit, 43a6bca, is relevant to 4.20.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.


The desired config found in the spec will get updated immediately when a new config is found on the node. However, the desired config found in the status will only get updated once the new config has been validated in the machine config daemon. In the current implementation, the desired config is populated in the status by checking whether the update successfully gets past the "UpdatePrepared" phase. If the "UpdatePrepared" phase succeeds, then the status can safely add the desired config.
<!-- TODO: check on the "the image has been successfully created" condition for the deire image being in the status -->
The desired config or image found in `Spec` will get updated immediately when a new config or image is found on the node. However, the desired config and image found in `Status` will only get updated once the new config has been validated in the MCD or the image has been successfully created. In the current implementation, the desired config is populated in the status by checking whether the update successfully gets past the "UpdatePrepared" phase. If the "UpdatePrepared" phase succeeds, then the status can safely add the desired config.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question to Reviewer

A standard update flow should progress as follows:

  1. Create rendered MC
  2. Update desired config value in MCN's Spec
  3. Validate new rendered MC
  4. Update desired config value in MCN's Status

My question is whether OCL has a similar "validation" step after setting the machineconfiguration.openshift.io/desiredImage annotation on the node and before the update to that value can occur? If not, does it make sense to have a desired image set in Spec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the question, are you asking if we validate the current state of the node on the previous config? Or whether we validate the incoming desiredImage to see if it's valid or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter, whether we validate the incoming desiredImage to see if it's valid or not. My intent behind this question is to understand if there is some validation post setting the desired image annotation on the node and when the update can actually proceed, similar to how MCN has the reconcile and other checks before UpdatePrepared is flipped to True.

There are three types of conditions in MCN:
- Parent
- These conditions track the overall arc of an upgrade.
- Includes `UpdatePrepared`, `UpdateExecuted`, `UpdatePostActionComplete`, `RebootedNode`, `Resumed`, `UpdateComplete`, and `Updated`.
- Child
- These conditions are phases that occur within the overarching parent phases.
- Includes `Drained`, `AppliedFilesAndOS`, `Cordoned`, and `Uncordoned`.
- In 4.19, this includes `Drained`, `AppliedFilesAndOS`, `Cordoned`, and `Uncordoned`.
- In 4.20, this additionally includes `ImagePulledFromRegistry`, `AppliedOSImage`, and `AppliedFiles`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to Reviewer

I'm not sure of the feasibility of "replacing" the currently existing AppliedFilesAndOS condition with new AppliedOSImage and AppliedFiles conditions. I think the best course forward with this idea is simply adding AppliedOSImage and AppliedFiles in 4.20 and later we can think about phasing out use of AppliedFilesAndOS if it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this answers my earlier question. I think it's fine since we should be still able to modify conditions, but I agree that this old one would be redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could easily be convinced to just keep AppliedFilesAndOS as is instead of migrating to them being split. I have no strong opinions either way, but it was a recommendation presented by @cheesesashimi that I think is worth getting perspectives from the team on.

@isabella-janssen isabella-janssen force-pushed the ocl-status-reporting branch 4 times, most recently from 9608341 to fbbc56c Compare June 16, 2025 16:48
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 16, 2025

@isabella-janssen: This pull request references MCO-1748 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This outlines the proposed changes to the MachineConfigNode (MCN) resource in 4.20 to finalize "Status Reporting" in the MCO. The primary change for 4.20 is reporting the status through OnClusterLayering-enabled node updates.

Note that since the 4.19 updates to the MCN enhancement have not yet merged (PR #1765), only the most recent commit, fbbc56c, is relevant to 4.20.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Comment on lines +114 to +115
Config Image:
Desired: image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:6600f777a1d8b3b5be31f483189b5dc813799fe45bb2ba18b5742b58e27e9387
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question to reviewer

Do we only want to add OCL-specific values in the MCN once OCL is enabled? I'd think yes, at least for the ConfigImage fields, but open to other perspectives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be +1 to having OCL fields be optional and only exist if the pool the node belongs to is doing OCL

classDef Phase font-weight:bold,fill:#bbbbbb,stroke:#000,color:#000
```

*Before an update is triggered, UPDATED will be True and all other statuses will be False.*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer

That the following console outputs are what I see as the possible MCN updates during an OCL update, but I would like reviews on the flow from someone with better knowledge of OCL to chime in.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 16, 2025

@isabella-janssen: This pull request references MCO-1748 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This outlines the proposed changes to the MachineConfigNode (MCN) resource in 4.20 to finalize "Status Reporting" in the MCO. The primary change for 4.20 is reporting the status through OnClusterLayering-enabled node updates.

Note that since the 4.19 updates to the MCN enhancement have not yet merged (PR #1765), only the most recent commit, b56df3e, is relevant to 4.20.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@isabella-janssen isabella-janssen marked this pull request as ready for review June 16, 2025 17:16
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2025
@openshift-ci openshift-ci bot requested review from jmguzik and jsafrane June 16, 2025 17:16
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments inline

Comment on lines +114 to +115
Config Image:
Desired: image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:6600f777a1d8b3b5be31f483189b5dc813799fe45bb2ba18b5742b58e27e9387
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be +1 to having OCL fields be optional and only exist if the pool the node belongs to is doing OCL


The desired config found in the spec will get updated immediately when a new config is found on the node. However, the desired config found in the status will only get updated once the new config has been validated in the machine config daemon. In the current implementation, the desired config is populated in the status by checking whether the update successfully gets past the "UpdatePrepared" phase. If the "UpdatePrepared" phase succeeds, then the status can safely add the desired config.
<!-- TODO: check on the "the image has been successfully created" condition for the deire image being in the status -->
The desired config or image found in `Spec` will get updated immediately when a new config or image is found on the node. However, the desired config and image found in `Status` will only get updated once the new config has been validated in the MCD or the image has been successfully created. In the current implementation, the desired config is populated in the status by checking whether the update successfully gets past the "UpdatePrepared" phase. If the "UpdatePrepared" phase succeeds, then the status can safely add the desired config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the question, are you asking if we validate the current state of the node on the previous config? Or whether we validate the incoming desiredImage to see if it's valid or not?

There are three types of conditions in MCN:
- Parent
- These conditions track the overall arc of an upgrade.
- Includes `UpdatePrepared`, `UpdateExecuted`, `UpdatePostActionComplete`, `RebootedNode`, `Resumed`, `UpdateComplete`, and `Updated`.
- Child
- These conditions are phases that occur within the overarching parent phases.
- Includes `Drained`, `AppliedFilesAndOS`, `Cordoned`, and `Uncordoned`.
- In 4.19, this includes `Drained`, `AppliedFilesAndOS`, `Cordoned`, and `Uncordoned`.
- In 4.20, this additionally includes `ImagePulledFromRegistry`, `AppliedOSImage`, and `AppliedFiles`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this answers my earlier question. I think it's fine since we should be still able to modify conditions, but I agree that this old one would be redundant.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 17, 2025

@isabella-janssen: This pull request references MCO-1748 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This outlines the proposed changes to the MachineConfigNode (MCN) resource in 4.20 to finalize "Status Reporting" in the MCO. The primary change for 4.20 is reporting the status through OnClusterLayering-enabled node updates.

Note that since the 4.19 updates to the MCN enhancement have not yet merged (PR #1765), only the most recent commit, 38ac165, is relevant to 4.20.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Jun 17, 2025

@isabella-janssen: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint dfe00ab link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 17, 2025

@isabella-janssen: This pull request references MCO-1748 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This outlines the proposed changes to the MachineConfigNode (MCN) resource in 4.20 to finalize "Status Reporting" in the MCO. The primary change for 4.20 is reporting the status through OnClusterLayering-enabled node updates.

Note that since the 4.19 updates to the MCN enhancement have not yet merged (PR #1765), only the most recent commit, dfe00ab, is relevant to 4.20.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants