nexus-db-queries: verify blueprint test covers all tables (#8455) #8667

sarataha · 2025-07-23T05:27:45Z

Add ensure_blueprint_fully_populated() to check all bp_* tables get populated by representative blueprint. Complements existing ensure_blueprint_fully_deleted().

Add ClickHouse config to representative blueprint.

Fixes #8455

…ter#8455) Add ensure_blueprint_fully_populated() to check all bp_* tables get populated. Add ClickHouse config to representative blueprint.

davepacheco

Thanks for taking a swing at this.

davepacheco · 2025-07-28T20:16:37Z

nexus/db-queries/src/db/datastore/deployment.rs

+        macro_rules! query_count {
+            ($table:ident, $blueprint_id_col:ident) => {{
+                use nexus_db_schema::schema::$table::dsl;
+                let result = dsl::$table
+                    .filter(
+                        dsl::$blueprint_id_col
+                            .eq(to_db_typed_uuid(blueprint_id)),
+                    )
+                    .count()
+                    .get_result_async(&*conn)
+                    .await;
+                (stringify!($table), result)
+            }};
+        }
+
+        // Tables prefixed with `bp_` that are *not* specific to a single blueprint
+        // and therefore intentionally ignored.
+        let tables_ignored: BTreeSet<_> = ["bp_target"].into_iter().collect();
+
+        let mut tables_checked = BTreeSet::new();
+        for (table_name, result) in [
+            query_count!(blueprint, id),
+            query_count!(bp_sled_metadata, blueprint_id),
+            query_count!(bp_omicron_dataset, blueprint_id),
+            query_count!(bp_omicron_physical_disk, blueprint_id),
+            query_count!(bp_omicron_zone, blueprint_id),
+            query_count!(bp_omicron_zone_nic, blueprint_id),
+            query_count!(bp_clickhouse_cluster_config, blueprint_id),
+            query_count!(bp_clickhouse_keeper_zone_id_to_node_id, blueprint_id),
+            query_count!(bp_clickhouse_server_zone_id_to_node_id, blueprint_id),
+            query_count!(bp_oximeter_read_policy, blueprint_id),
+            query_count!(bp_pending_mgs_update_sp, blueprint_id),
+            query_count!(bp_pending_mgs_update_rot, blueprint_id),
+            query_count!(bp_pending_mgs_update_rot_bootloader, blueprint_id),


I feel like we want to commonize most of this function with ensure_blueprint_fully_deleted(). Maybe we want a function that just counts the rows in each of these tables and returns whether they're all zero or all non-zero or something in between? Then call that from here and ensure_blueprint_fully_deleted()?

That doesn't allow us flexibility when we really do have an exception. For example, some combinations might be mutually exclusive and we'd need to be able to say it's okay if this table or that table has no rows. The function could return a struct of counters and the caller could decide what it expects? It'd probably be useful to have helpers fn all_empty(&self) -> bool, fn all_non_empty() -> bool, fn empty_tables() -> Vec<String> (returns a list of the empty tables), and fn non_empty_tables() -> Vec<String> so that callers could implement exceptions without having to check all the non-exceptional cases.

Thank you David! This approach is more flexible and much cleaner indeed!

davepacheco · 2025-07-28T20:17:40Z

nexus/db-queries/src/db/datastore/deployment.rs

+            // Pending MGS update tables are only populated when the blueprint includes firmware updates.
+            //  The representative blueprint doesn't, so allow zero rows for these tables.
+            if matches!(
+                table_name,
+                "bp_pending_mgs_update_sp"
+                    | "bp_pending_mgs_update_rot"
+                    | "bp_pending_mgs_update_rot_bootloader"
+            ) {
+                tables_checked.insert(table_name);
+                continue;
+            }


I thought the representative collection did have some kind of update. If not, shouldn't it? Maybe it could have one of each type (probably with different baseboard ids)? Our planner would never do that but I don't believe it's actually invalid.

I just checked and the representative blueprint seems to currently not having firmware updates. It makes sense to add one of each type with different baseboard IDs. Thank you! 🙂

davepacheco · 2025-07-28T20:18:25Z

nexus/db-queries/src/db/datastore/deployment.rs

+        // -----------------------------------------------------------------
+        // Populate minimal Clickhouse cluster configuration so the blueprint
+        // exercises the `bp_clickhouse_*` tables (issue #8455).
+        // -----------------------------------------------------------------


nit: we don't use these // -------- comments.

davepacheco · 2025-07-28T20:24:49Z

nexus/db-queries/src/db/datastore/deployment.rs

+        // Populate minimal Clickhouse cluster configuration so the blueprint
+        // exercises the `bp_clickhouse_*` tables (issue #8455).
+        // -----------------------------------------------------------------
+        if blueprint.clickhouse_cluster_config.is_none() {


I'm reluctant to change the representative blueprint to include multi-node Clickhouse because right now this is not the configuration we use outside of development. I'd rather we be testing the one we use in production. This example is what led to my comment elsewhere that the caller might want to decide if some tables should just be exceptions. This way, the main representative blueprint could not have this in it, the test could verify that these are the only empty tables, and then if we want we could have a different blueprint that does have multi-node clickhouse in it and make sure it has the values we expect in the tables (and that those get cleaned up when it gets deleted). Does that make sense?

That make sense, thank you! I'll remove the Clickhouse changes and rely on the table checking exceptions.

Create BlueprintTableCounts struct and refactor both ensure functions to use it. Remove ClickHouse config from representative blueprint.

nexus-db-queries: verify blueprint test covers all tables (oxidecompu…

15ae7f2

…ter#8455) Add ensure_blueprint_fully_populated() to check all bp_* tables get populated. Add ClickHouse config to representative blueprint.

davepacheco reviewed Jul 28, 2025

View reviewed changes

Refactor blueprint table checking with shared struct

12f87dc

Create BlueprintTableCounts struct and refactor both ensure functions to use it. Remove ClickHouse config from representative blueprint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nexus-db-queries: verify blueprint test covers all tables (#8455) #8667

nexus-db-queries: verify blueprint test covers all tables (#8455) #8667

Uh oh!

sarataha commented Jul 23, 2025

Uh oh!

davepacheco left a comment

Uh oh!

davepacheco Jul 28, 2025

Uh oh!

sarataha Jul 30, 2025

Uh oh!

davepacheco Jul 28, 2025

Uh oh!

sarataha Jul 30, 2025

Uh oh!

davepacheco Jul 28, 2025

Uh oh!

davepacheco Jul 28, 2025

Uh oh!

sarataha Jul 30, 2025

Uh oh!

Uh oh!

nexus-db-queries: verify blueprint test covers all tables (#8455) #8667

Are you sure you want to change the base?

nexus-db-queries: verify blueprint test covers all tables (#8455) #8667

Uh oh!

Conversation

sarataha commented Jul 23, 2025

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

davepacheco Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

sarataha Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

davepacheco Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

sarataha Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

davepacheco Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

davepacheco Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

sarataha Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!