Skip to content

nexus-db-queries: verify blueprint test covers all tables (#8455) #8667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sarataha
Copy link

Add ensure_blueprint_fully_populated() to check all bp_* tables get populated by representative blueprint. Complements existing ensure_blueprint_fully_deleted().

Add ClickHouse config to representative blueprint.

Fixes #8455

…ter#8455)

Add ensure_blueprint_fully_populated() to check all bp_* tables get populated.
Add ClickHouse config to representative blueprint.
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a swing at this.

Comment on lines 4296 to 4329
macro_rules! query_count {
($table:ident, $blueprint_id_col:ident) => {{
use nexus_db_schema::schema::$table::dsl;
let result = dsl::$table
.filter(
dsl::$blueprint_id_col
.eq(to_db_typed_uuid(blueprint_id)),
)
.count()
.get_result_async(&*conn)
.await;
(stringify!($table), result)
}};
}

// Tables prefixed with `bp_` that are *not* specific to a single blueprint
// and therefore intentionally ignored.
let tables_ignored: BTreeSet<_> = ["bp_target"].into_iter().collect();

let mut tables_checked = BTreeSet::new();
for (table_name, result) in [
query_count!(blueprint, id),
query_count!(bp_sled_metadata, blueprint_id),
query_count!(bp_omicron_dataset, blueprint_id),
query_count!(bp_omicron_physical_disk, blueprint_id),
query_count!(bp_omicron_zone, blueprint_id),
query_count!(bp_omicron_zone_nic, blueprint_id),
query_count!(bp_clickhouse_cluster_config, blueprint_id),
query_count!(bp_clickhouse_keeper_zone_id_to_node_id, blueprint_id),
query_count!(bp_clickhouse_server_zone_id_to_node_id, blueprint_id),
query_count!(bp_oximeter_read_policy, blueprint_id),
query_count!(bp_pending_mgs_update_sp, blueprint_id),
query_count!(bp_pending_mgs_update_rot, blueprint_id),
query_count!(bp_pending_mgs_update_rot_bootloader, blueprint_id),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we want to commonize most of this function with ensure_blueprint_fully_deleted(). Maybe we want a function that just counts the rows in each of these tables and returns whether they're all zero or all non-zero or something in between? Then call that from here and ensure_blueprint_fully_deleted()?

That doesn't allow us flexibility when we really do have an exception. For example, some combinations might be mutually exclusive and we'd need to be able to say it's okay if this table or that table has no rows. The function could return a struct of counters and the caller could decide what it expects? It'd probably be useful to have helpers fn all_empty(&self) -> bool, fn all_non_empty() -> bool, fn empty_tables() -> Vec<String> (returns a list of the empty tables), and fn non_empty_tables() -> Vec<String> so that callers could implement exceptions without having to check all the non-exceptional cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you David! This approach is more flexible and much cleaner indeed!

Comment on lines 4333 to 4343
// Pending MGS update tables are only populated when the blueprint includes firmware updates.
// The representative blueprint doesn't, so allow zero rows for these tables.
if matches!(
table_name,
"bp_pending_mgs_update_sp"
| "bp_pending_mgs_update_rot"
| "bp_pending_mgs_update_rot_bootloader"
) {
tables_checked.insert(table_name);
continue;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the representative collection did have some kind of update. If not, shouldn't it? Maybe it could have one of each type (probably with different baseboard ids)? Our planner would never do that but I don't believe it's actually invalid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and the representative blueprint seems to currently not having firmware updates. It makes sense to add one of each type with different baseboard IDs. Thank you! 🙂

Comment on lines 2980 to 2983
// -----------------------------------------------------------------
// Populate minimal Clickhouse cluster configuration so the blueprint
// exercises the `bp_clickhouse_*` tables (issue #8455).
// -----------------------------------------------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't use these // -------- comments.

// Populate minimal Clickhouse cluster configuration so the blueprint
// exercises the `bp_clickhouse_*` tables (issue #8455).
// -----------------------------------------------------------------
if blueprint.clickhouse_cluster_config.is_none() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reluctant to change the representative blueprint to include multi-node Clickhouse because right now this is not the configuration we use outside of development. I'd rather we be testing the one we use in production. This example is what led to my comment elsewhere that the caller might want to decide if some tables should just be exceptions. This way, the main representative blueprint could not have this in it, the test could verify that these are the only empty tables, and then if we want we could have a different blueprint that does have multi-node clickhouse in it and make sure it has the values we expect in the tables (and that those get cleaned up when it gets deleted). Does that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense, thank you! I'll remove the Clickhouse changes and rely on the table checking exceptions.

Create BlueprintTableCounts struct and refactor both ensure functions to use it.
Remove ClickHouse config from representative blueprint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blueprint coverage test should verify that all blueprint tables get used
2 participants