Skip to content

Conversation

aarnq
Copy link
Contributor

@aarnq aarnq commented Dec 18, 2024

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • kind/adr

What does this PR do / why do we need this PR?

This implements the idea of change for migrations that was proposed before to combat the divergence around releases, and to simplify contribution.
The idea is that the migration/main is the one under development, and during release it is promoted to the release series, example migration/v0.44.

Hopefully this change in process will also be helpful to realise #2287, with reusable snippets naturally as we go.

This is in a POC phase, as I've implemented the change in migration structure, but yet to implement the handling in the bin scripts, as I want it to be finalised first.
The process documents supporting migration contribution and release will also need more updates once completed.

Is implemented.

Information to reviewers

./migration/promote.sh v0.44
It is also re-runnable.

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change upgrades CRDs
    • The change updates the config and the schema
  • Documentation checks:
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts are not affected)
    • The metrics names did change (Grafana dashboards and Prometheus alerts were fixed)
  • Logs checks:
    • The logs do not show any errors after the change
  • Pod Security Policy checks:
    • Any changed pod is covered by Pod Security Admission
    • Any changed pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any pods to be blocked by Pod Security Admission or Policies
  • Network Policy checks:
    • Any changed pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@aarnq aarnq added the kind/improvement Improvement of existing features, e.g. code cleanup or optimizations. label Dec 18, 2024
@aarnq aarnq self-assigned this Dec 18, 2024
@aarnq aarnq requested review from a team as code owners December 18, 2024 07:56
@aarnq
Copy link
Contributor Author

aarnq commented Dec 18, 2024

Note to self: Update public docs maintenance page.

Copy link
Contributor

@anders-elastisys anders-elastisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, really good that this is getting finally getting addressed as I know that the old system could be problematic if migrations steps did not end up in the proper release folder before a release was in progress. Left some comments, but overall I think it looks really good, have not tested the scripts yet however

@aarnq aarnq force-pushed the aarnq/update-migrations branch from 35b5692 to 60f907f Compare December 20, 2024 08:16
@aarnq aarnq force-pushed the aarnq/update-migrations branch from 332d698 to a775055 Compare January 7, 2025 15:31
@aarnq
Copy link
Contributor Author

aarnq commented Jan 7, 2025

This is fully ready now.

@aarnq aarnq requested review from OlleLarsson and simonklb January 28, 2025 12:41
Copy link
Contributor

@OlleLarsson OlleLarsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I added some nits and a question.

Comment on lines 83 to 84
Once the new directory is created you must manually remove snippets and steps that was specific for the new release, be sure to remove any exclusions from them set in the `80-apply-sh` snippet based on the tip in the previous section.
General migration steps, such as snippets for upgrading OpenSearch, Prometheus, or Thanos, may be retained as long as they verify beforehand they run that they are needed, and fall back to a generic upgrade steps if not required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here and I hope you can clarify it for me :) Once the new release migration directory is created, which snippets and steps should I remove? Everything in main/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, definitely not all.

The idea is to only remove those snippets that are specific to the version that will never be used again, so we can build up a library of reusable ones that we can iteratively improve, and potentially implement proper rollback steps for.

As an example, v0.44/apply/02-migrate-gpu-operator-ns.sh is only ever going to make sense once.

While v0.44/apply/10-kube-prometheus-stack.sh is something we've seen in a few versions, and would be better served with a persistent snippet that would be made a bit more generic.
In its current state it already looks good enough to be kept, as it doesn't even use hardcoded versions.

Perhaps the way to go would be to add some sort of tag to the ones we know should be deleted, so that part of the process is clearer. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think I understand now!

Maybe the snippets we want to keep could be moved to a library/ directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with a different option to add suffixes for those that should be deleted, as I think that would require special handling in the migration machinery to keep them in a separate directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea right now is that as long as a migration script that is added is version-generic it should be kept in main/ and would then be part of all future releases (until it is actively removed)?
My feeling right now is that it might end up being a bit messy if we have like 10-20 generic scripts in main/ that is added to every release, but only some of them are actually needed in any particular release. But maybe it will not grow to be that big.

I sort of like the idea Olle had that we can move the generic scripts to library/ after the release is done, then copy them back to main/ later when they are needed for a release again. One disadvantage I see with that is that we need to consider if unused scripts should be removed from the library at some point.

What are your thoughts on this? Should we reconsider using the library option?

Copy link
Contributor

@simonklb simonklb Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also stumped me a bit, I'm worried that having to remember naming things correctly and removing the correct files etc might be missed or done incorrectly.
Improving the machinery over trusting humans to do the manual process correctly is the way to go IMO.

How would we make a machinery to evaluate what humans script are required only for one version in specific? Or what is your idea?

Edit: Sorry for the late reply, no idea how this fell through the cracks in my inbox.

My idea is that you would keep the generic snippets that should always run in one place that is kept around for every migration and version specific snippets in a different location so that you don't have to think about the naming or removing some files and keeping some files around.

For example:
migration/include-always/01-opensearch.bash
migration/include-always/02-prometheus.bash
migration/include-always/03-apply.bash
migration/v0.45/01-specific-version-migration.bash

Then it's up to the person introducing the snippets to put them in their correct spot and not the one promoting to the next version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but you have to think of the naming, always, because that is how order is determined, and because of that I see that it is best to keep them in the same directory so it is more intuitive for contributors and reviewers.
Because then what they see is what happens, what is in main will be the next release.

If it is only to identify which snippets are version specific, I rather have a marker in its name because then we can easily identify and automatically prune it when we promote the next version.

# example
./migration/main/apply/60-prometheus.sh
./migration/main/apply/69-openserch-migrate-snapshots-only-once.sh
./migration/main/apply/70-openserch.sh
./migration/main/apply/80-apply.sh

And to add to this, if we must pre-create the migration directory for specific releases when developing specific migrations, then this update won't change anything for what prompted the change in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm fine with this if we enforce naming so that you have to make the active choice when you introduce a new snippet and then we have automation around what should be left and what should be removed after a promotion. When you promote you should not have to manually pick and choose, it should just be one command and preferably be done as part of cutting a release.

./migration/main/apply/60-prometheus-keep.sh
./migration/main/apply/69-openserch-migrate-snapshots-only-once.sh
./migration/main/apply/70-openserch-keep.sh
./migration/main/apply/80-apply-keep.sh

Not sure about the naming though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, then you can have a linter that enforces the naming scheme so that you are forced to make the choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enforced it with suffixes through pre-commit, chose "all-versions" and "one-version" as "only-once" might make the contributor believe that it will ever only be run once, which isn't the case with patches.

@aarnq aarnq force-pushed the aarnq/update-migrations branch from 7570441 to cb2b7c2 Compare February 17, 2025 07:07
Copy link
Contributor

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major but some things I'd like to be explained and/or fixed before approving the merge! Nice job!

@aarnq aarnq force-pushed the aarnq/update-migrations branch from cb2b7c2 to afc5ce2 Compare May 27, 2025 07:43
Copy link
Contributor

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tiny blocker left. Great work! This is going to improve the migration handling a lot! 🥳

Copy link
Contributor

@viktor-f viktor-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I think it is excellent that we have a clear way to upgrade to current main and I think the way to keep some migration scripts for future releases turned out really well.
Note that I have not checked the logic of all the scripts very closely.

Copy link
Contributor

@anders-elastisys anders-elastisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, and nicely documented 🙌
Did also not look too closely at the scripts.
One question, as we already have some scripts that I believe are reusable (e.g. kube-prometheus-stack) should these be added with the all-versions suffix already, or perhaps it should be in a separate PR?

@aarnq aarnq force-pushed the aarnq/update-migrations branch from a8ffd89 to 1a871c7 Compare June 25, 2025 14:04
@lucianvlad
Copy link
Contributor

Are you expecting to do more work on this, or this can be merged? @aarnq

@aarnq
Copy link
Contributor Author

aarnq commented Sep 15, 2025

Are you expecting to do more work on this, or this can be merged? @aarnq

I'll see if I can merge it after the next release so we can start with a clean migration for it.

@lucianvlad
Copy link
Contributor

Are you expecting to do more work on this, or this can be merged? @aarnq

I'll see if I can merge it after the next release so we can start with a clean migration for it.

Nice work!
Release for apps v0.49.0 was completed this week and is now available here
Feel free to fix the conflicts and merge this when you have the time for it .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/improvement Improvement of existing features, e.g. code cleanup or optimizations.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants