Skip to content

Conversation

@2uasimojo
Copy link
Member

@2uasimojo 2uasimojo commented Aug 5, 2025

Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time.

With this change, we're offloading metadata.json verbatim (except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / #2612) to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.

For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work.

This change then adds a new generic destroyer via the (existing) hiveutil deprovision command that consumes this metadata.json to deprovision the cluster.

This new behavior is the default, but we also include an escape hatch to run the platform-specific legacy destroyer by setting the following annotation on the ClusterDeployment:

hive.openshift.io/legacy-deprovision: "true"

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 5, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 5, 2025

@2uasimojo: This pull request references HIVE-2302 which is a valid jira issue.

In response to this:

Well, mostly.

Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time.

With this change, we're instead offloading it verbatim to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.

Instead of building the installer's ClusterMetadata structure for the destroyer with individual fields from the CD's ClusterMetadata, we're unmarshaling it directly from the contents of that Secret.

(Except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / #2612)

For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work. If this results in a hanging deprovision due to a missing field, the workaround is to modify the contents of the Secret to add it in; then kill the deprovision pod and the next attempt should pick up the changes. (If the result is a "successful" deprovision with leaked resources, the only workaround is to clean up the infra manually. Sorry.)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 5, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 5, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 5, 2025

@2uasimojo: This pull request references HIVE-2302 which is a valid jira issue.

In response to this:

Well, mostly.

Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time.

With this change, we're instead offloading it verbatim to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.

Instead of building the installer's ClusterMetadata structure for the destroyer with individual fields from the CD's ClusterMetadata, we're unmarshaling it directly from the contents of that Secret.

(Except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / #2612)

For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work. If this results in a hanging deprovision due to a missing field, the workaround is to modify the contents of the Secret to add it in; then kill the deprovision pod and the next attempt should pick up the changes. (If the result is a "successful" deprovision with leaked resources, the only workaround is to clean up the infra manually. Sorry.)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 5, 2025
@2uasimojo
Copy link
Member Author

TODO: Deprovisioner side

@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from 8707a79 to d09e43e Compare August 15, 2025 17:59
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 15, 2025

@2uasimojo: This pull request references HIVE-2302 which is a valid jira issue.

In response to this:

Well, mostly.

Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time.

With this change, we're offloading metadata.json verbatim (except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / #2612) to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.

For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work.

In the future (but not here!) instead of building the installer's ClusterMetadata structure for the destroyer with individual fields from the CD's ClusterMetadata, we'll unmarshal it directly from the contents of this Secret.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 15, 2025

@2uasimojo: This pull request references HIVE-2302 which is a valid jira issue.

In response to this:

Well, mostly.

Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time.

With this change, we're offloading metadata.json verbatim (except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / #2612) to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.

For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work.

In the future (but not here!) instead of building the installer's ClusterMetadata structure for the destroyer with individual fields from the CD's ClusterMetadata, we'll unmarshal it directly from the contents of this Secret.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from d09e43e to 13ad458 Compare August 19, 2025 21:20
@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from 13ad458 to ace243e Compare September 9, 2025 20:36
@2uasimojo 2uasimojo changed the title HIVE-2302: Pass metadata.json through opaquely HIVE-2302, HIVE-2644: Pass metadata.json through opaquely Sep 9, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 9, 2025

@2uasimojo: This pull request references HIVE-2302 which is a valid jira issue.

This pull request references HIVE-2644 which is a valid jira issue.

In response to this:

Well, mostly.

Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time.

With this change, we're offloading metadata.json verbatim (except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / #2612) to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.

For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work.

In the future (but not here!) instead of building the installer's ClusterMetadata structure for the destroyer with individual fields from the CD's ClusterMetadata, we'll unmarshal it directly from the contents of this Secret.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 9, 2025

@2uasimojo: This pull request references HIVE-2302 which is a valid jira issue.

This pull request references HIVE-2644 which is a valid jira issue.

In response to this:

Previously any time installer added a field to metadata.json, we would need to evaluate and possibly add a bespoke field and code path for it to make sure it was supplied to the destroyer at deprovision time.

With this change, we're offloading metadata.json verbatim (except in some cases we have to scrub/replace credentials fields -- see HIVE-2804 / #2612) to a new Secret in the ClusterDeployment's namespace, referenced from a new field: ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.

For legacy clusters -- those created before this change -- we attempt to retrofit the new Secret based on the legacy fields. This is best effort and may not always work.

This change then adds a new generic destroyer via the (existing) hiveutil deprovision command that consumes this metadata.json to deprovision the cluster.

This new behavior is the default, but we also include an escape hatch to run the platform-specific legacy destroyer by setting the following annotation on the ClusterDeployment:

hive.openshift.io/legacy-deprovision: "true"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from ace243e to cf8610a Compare September 9, 2025 21:50
@2uasimojo
Copy link
Member Author

/test e2e e2e-azure e2e-gcp e2e-vsphere e2e-openstack

🤞

@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from cf8610a to 268d7cc Compare September 10, 2025 19:01
@2uasimojo
Copy link
Member Author

/test e2e-azure e2e-vsphere

@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from 268d7cc to bce1629 Compare September 23, 2025 19:38
@2uasimojo
Copy link
Member Author

  • This is now rebased on HIVE-2908: Remove OVirt (RHV) #2746
  • VSphere (hopefully) remedied: metadata.json can be in one of two different shapes depending on pre- or post-zonal. We should now be accounting for both.

/test e2e-vsphere

@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from bce1629 to cb1b5d5 Compare September 23, 2025 21:31
@2uasimojo 2uasimojo marked this pull request as ready for review September 23, 2025 21:32
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 23, 2025
@2uasimojo
Copy link
Member Author

/hold for moar testings

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2025
@openshift-ci openshift-ci bot requested review from dlom and jstuever September 23, 2025 21:32
@codecov
Copy link

codecov bot commented Sep 23, 2025

Codecov Report

❌ Patch coverage is 22.57384% with 367 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.19%. Comparing base (eb1dc4f) to head (42c37de).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
.../clusterdeployment/clusterdeployment_controller.go 29.74% 104 Missing and 7 partials ⚠️
pkg/install/generate.go 18.18% 95 Missing and 4 partials ⚠️
contrib/pkg/deprovision/deprovision.go 0.00% 45 Missing ⚠️
pkg/clusterresource/builder.go 0.00% 18 Missing and 3 partials ⚠️
contrib/pkg/utils/vsphere/vsphere.go 0.00% 18 Missing ⚠️
contrib/pkg/createcluster/create.go 0.00% 11 Missing ⚠️
pkg/installmanager/installmanager.go 0.00% 8 Missing and 1 partial ⚠️
contrib/pkg/utils/nutanix/nutanix.go 0.00% 8 Missing ⚠️
.../controller/clusterdeployment/clusterprovisions.go 78.57% 3 Missing and 3 partials ⚠️
pkg/installmanager/fake.go 0.00% 5 Missing ⚠️
... and 14 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2729      +/-   ##
==========================================
- Coverage   50.45%   50.19%   -0.27%     
==========================================
  Files         284      284              
  Lines       33968    34267     +299     
==========================================
+ Hits        17140    17200      +60     
- Misses      15489    15713     +224     
- Partials     1339     1354      +15     
Files with missing lines Coverage Δ
pkg/constants/constants.go 100.00% <ø> (ø)
...lusterdeprovision/clusterdeprovision_controller.go 53.40% <100.00%> (+0.17%) ⬆️
pkg/controller/utils/logtagger.go 100.00% <100.00%> (ø)
.../v1/clusterdeployment_validating_admission_hook.go 87.07% <100.00%> (+0.04%) ⬆️
...shift/hive/apis/hive/v1/clusterdeployment_types.go 0.00% <ø> (ø)
...hift/hive/apis/hive/v1/clusterdeprovision_types.go 0.00% <ø> (ø)
contrib/pkg/deprovision/awstagdeprovision.go 0.00% <0.00%> (ø)
contrib/pkg/utils/aws/aws.go 0.00% <0.00%> (ø)
contrib/pkg/utils/azure/azure.go 0.00% <0.00%> (ø)
contrib/pkg/utils/gcp/gcp.go 0.00% <0.00%> (ø)
... and 20 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from e2b93af to 553ba5b Compare October 31, 2025 14:45
Well, mostly.

Previously any time installer added a field to metadata.json, we would
need to evaluate and possibly add a bespoke field and code path for it
to make sure it was supplied to the destroyer at deprovision time.

With this change, we're offloading metadata.json verbatim (except in
some cases we have to scrub/replace credentials fields -- see HIVE-2804
/ openshift#2612) to a new Secret in the ClusterDeployment's namespace,
referenced from a new field:
ClusterDeployment.Spec.ClusterMetadata.MetadataJSONSecretRef.

For legacy clusters -- those created before this change -- we attempt to
retrofit the new Secret based on the legacy fields. This is best effort
and may not always work.

In the future (but not here!) instead of building the installer's
ClusterMetadata structure for the destroyer with individual fields from
the CD's ClusterMetadata, we'll unmarshal it directly from the contents
of this Secret.
An earlier commit ensures that ClusterDeployments have an associated
Secret containing the metadata.json emitted by the installer.

This change adds a new generic destroyer via the (existing) `hiveutil
deprovision` command that consumes this metadata.json to deprovision the
cluster.

This new behavior is the default, but we also include an escape hatch to
run the platform-specific legacy destroyer by setting the following
annotation on the ClusterDeployment:

`hive.openshift.io/legacy-deprovision: "true"`
A couple of providers (nutanix, vsphere) need bespoke code to populate
credentials in the metadata.json object for destroying a cluster. In a
prior commit this was being done in the deprovisioner (the new one, that
uses metadata.json directly, per HIVE-2302) after ConfigureCreds.

Since ConfigureCreds is where we (stay with me) configure creds, and is
already platform-specific, it makes more sense to do this work there.
This commit refactors to do so.

Legacy code paths pass in a `nil` metadata object, which is coded to
result in no change from the previous functionality. (In particular,
ConfigureCreds is also used when provisioning, where no metadata object
is present/necessary.)
@2uasimojo 2uasimojo force-pushed the HIVE-2302/metadata.json-passthrough branch from 553ba5b to 42c37de Compare October 31, 2025 19:31
@2uasimojo
Copy link
Member Author

/test verify

@dlom
Copy link
Contributor

dlom commented Oct 31, 2025

/lgtm

@2uasimojo merge when you're ready

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 31, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, dlom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@2uasimojo
Copy link
Member Author

/test verify e2e-azure

@2uasimojo
Copy link
Member Author

The verify fail smells like a testplatform issue. I'll try again on Monday.

@2uasimojo
Copy link
Member Author

The verify fail smells like a testplatform issue. I'll try again on Monday.

Opened https://issues.redhat.com/browse/DPTP-4590

2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Oct 31, 2025
What started this was HIVE-2302 / openshift#2729 where I wanted to be able to use
the ConfigureCreds map from installmanager as well as deprovision, but
couldn't do it easily due to circular imports. I got that done, but also
knocked down some other tech debt at the same time. I moved things
closer to where they're used, in many cases eliminating the packages
they came from, addressed some deprecations, deleted some unused stuff,
shaved off a few lines of code in some places, got rid of some soft
linting squiggles, DRYed some duplicated symbols, that kind of thing.
@2uasimojo 2uasimojo mentioned this pull request Oct 31, 2025
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Oct 31, 2025
What started this was HIVE-2302 / openshift#2729 where I wanted to be able to use
the ConfigureCreds map from installmanager as well as deprovision, but
couldn't do it easily due to circular imports. I got that done, but also
knocked down some other tech debt at the same time. I moved things
closer to where they're used, in many cases eliminating the packages
they came from, addressed some deprecations, deleted some unused stuff,
shaved off a few lines of code in some places, got rid of some soft
linting squiggles, DRYed some duplicated symbols, that kind of thing.
@2uasimojo
Copy link
Member Author

/retest-required

2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Nov 3, 2025
What started this was HIVE-2302 / openshift#2729 where I wanted to be able to use
the ConfigureCreds map from installmanager as well as deprovision, but
couldn't do it easily due to circular imports. I got that done, but also
knocked down some other tech debt at the same time. I moved things
closer to where they're used, in many cases eliminating the packages
they came from, addressed some deprecations, deleted some unused stuff,
shaved off a few lines of code in some places, got rid of some soft
linting squiggles, DRYed some duplicated symbols, that kind of thing.
@2uasimojo
Copy link
Member Author

/hold cancel

QE looks good. There is one pending issue (start reading here) but it would need to be coordinated with the installer team, if we decide to address it at all.

Let's go!

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 4, 2025
@2uasimojo
Copy link
Member Author

/test e2e-gcp

@2uasimojo
Copy link
Member Author

/test verify

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2025

@2uasimojo: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit cb63336 into openshift:master Nov 4, 2025
24 checks passed
@2uasimojo 2uasimojo deleted the HIVE-2302/metadata.json-passthrough branch November 4, 2025 20:42
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Nov 4, 2025
What started this was HIVE-2302 / openshift#2729 where I wanted to be able to use
the ConfigureCreds map from installmanager as well as deprovision, but
couldn't do it easily due to circular imports. I got that done, but also
knocked down some other tech debt at the same time. I moved things
closer to where they're used, in many cases eliminating the packages
they came from, addressed some deprecations, deleted some unused stuff,
shaved off a few lines of code in some places, got rid of some soft
linting squiggles, DRYed some duplicated symbols, that kind of thing.
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Nov 6, 2025
What started this was HIVE-2302 / openshift#2729 where I wanted to be able to use
the ConfigureCreds map from installmanager as well as deprovision, but
couldn't do it easily due to circular imports. I got that done, but also
knocked down some other tech debt at the same time. I moved things
closer to where they're used, in many cases eliminating the packages
they came from, addressed some deprecations, deleted some unused stuff,
shaved off a few lines of code in some places, got rid of some soft
linting squiggles, DRYed some duplicated symbols, that kind of thing.
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Nov 13, 2025
For purposes of this work, there are three kinds of clusters:
- IPI, where hive manages the provisioning via installer.
- UPI, where a ClusterInstall implementation provisions the cluster and
  hive just watches and copies over the status.
- Fake, used for testing purposes.

When we implemented HIVE-2302 / openshift#2729, we didn't account for
UPI/ClusterInstall, which we don't really have a good way to test, and
ended up injecting a bug:

Via that effort, we started populating a new Secret containing
metadata.json produced by the installer. For legacy clusters (those that
existed before upgrading to a version with this feature) we need to
retrofit that Secret based on the ClusterMetadata (among other things),
which was previously how we saved off the metadata.json. For IPI, that
ClusterMetadata always had a Platform section. The changes we had to
make for fake clusters saw us spoofing a very sparse metadata.json and
then populating it later. That process relied on the existence of the
CD.Spec.ClusterMetadata.Platform section, so we were creating it for the
sake of that fake cluster path. However, it turns out that
ClusterInstall subclasses don't (and don't need to) populate the
ClusterMetadata.Platform section. Since we blindly copy the
ClusterMetadata into the ClusterDeployment, we could end up with that
Platform section being absent when we come to retrofit the
metadata.json. We would then hit the path designed for fake clusters
where we would create that section (empty). No problem, right? Except
that we have a validating admission webhook that forbids making changes
to the ClusterMetadata section, and that new, empty Platform field was
flagged as such a change, and bounced by the webhook.

Phew.

So with this change, we condition populating that Platform section
explicitly on fake clusters only.
2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Nov 18, 2025
In HIVE-2302 / openshift#2729 we rearranged the way installmanager figures out
which platform-specific ConfigureCreds() function to call, but forgot to
include IBMCloud in the lookup table. (BareMetal is there, which we
don't use, and which is probably why this was missed, as it made the
number of entries correct.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants