Skip to content

make sleds report their CPU families to Nexus #8725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Jul 29, 2025

RFD 505 proposes that instances should be able to set a "minimum hardware platform" or "minimum CPU platform" that allows users to constrain an instance to run on sleds that have a specific set of CPU features available. This allows a user to opt a VM into advanced hardware features (e.g. AVX-512 support) by constraining it to run only on sleds that support those features. For this to work, Nexus needs to understand what CPUs are present in which sleds.

Have sled-agent query CPUID to get CPU vendor and family information and report this to Nexus as part of the sled hardware manifest.


This is largely code Greg had put together that, now that the control plane bits that build on it are imminently to be PR'd, I'd like to get merged. Initially it was oriented around AmdFamilyXXh, but I've adjusted towards specific family names since there is precedent for one family (19h) which has very different features (Zen 4-based parts have AVX512, Zen 3-based parts do not). Additionally, client vs server parts sometimes are segmented along feature lines in annoying ways - not such that VMs would see a difference, but over in Intel land that does happen. If times were to so substantially change...

I mostly mention this because if that rationale is not clear from the code and comments, please say so!

RFD 505 proposes that instances should be able to set a "minimum
hardware platform" or "minimum CPU platform" that allows users to
constrain an instance to run on sleds that have a specific set of CPU
features available. This allows a user to opt a VM into advanced
hardware features (e.g. AVX-512 support) by constraining it to run only
on sleds that support those features. For this to work, Nexus needs to
understand what CPUs are present in which sleds.

Have sled-agent query CPUID to get CPU vendor and family information and
report this to Nexus as part of the sled hardware manifest.
Comment on lines 773 to 776
// In lab and development environments in particular, the family reported here
// may differ from the real processor family. `sled-hardware::detect_cpu_family`
// tries to map various CPUs that we would not ship in a rack to their
// greatest-common-denominator family names here.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs on the external enum here is asymmetric with the docs on the internal enum. My thinking is that for API clients, usage will overwhelmingly be deployed racks with Gimlets and Cosmos and Whatever Comes After, where there really should never be a CPU that is not accurately identified by the families here.

The hedging about client parts and CPUs that Nexus could see but we don't ship would only be confusing here, but I think it's useful for internal APIs as the set of supported CPUs grows.

Comment on lines 265 to 268
// These are Turin Dense, but from a CPU feature perspective they're
// equivalently capable to Turin, so for our purposes they're the
// same.
CpuFamily::AmdTurin
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably could/should be a TurinDense variant that happens to be compatible with the Turin "CPU platform" in the forthcoming instance CPU platforms PR (adding this to the Sled external view was a relatively late addition and I'm realizing that it would be pretty confusing for a rack with Turin Dense CPUs to be reported as just "Turin"..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree --- I can imagine an operator wanting to know whether a particular rack consists of Turin or Turin Dense sleds (or if a rack contains a mix!), as long as we don't mind taking the extra logic to treat those as equivalent WRT virtual platform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, #8730 talks about it more and i've backed out the external API bits here but i'm leaving the TurinDense variant in - this is really pretty insufficient for operator questions ("Turin" might mean "Turin High Frequency Part" and "Turin Dense" doesn't tell you if it's "Turin With Twelve Core CCDs" or "Turin With Sixteen Core CCDs")

you might ask, "why keep the distinction if this is just about CPU feautres?" and my answer is mostly hedging against weirder futures. Turin and Turin Dense are each as able to run a "requires Turin" VM, based solely on feature sets, but I could imagine a scenario where one family has even more AVX512 "AI Acceleration" while its sibling family doesn't. Since Nexus is otherwise responsible for "can this VM run on that set of CPU features" it sled-agent ought to say as much as is reasonable in the context of real Oxide hardware..

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me, I left a handful of little notes.

Comment on lines +173 to +174
// "AuthenticAMD"; see AMD APM volume 3 (March 2024) section E.3.1.
(0x68747541, 0x444D4163, 0x69746E65) => {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wanted to try and come up with a way to to_le_bytes/from_le_bytes this so you could match on b"AuthenticAMD" but it ends up looking kinda tortured and i don't actually care that much, carry on as you were --- the hex numbers are funnier anyway.

Comment on lines 224 to 268
// TODO(?): Exhaustively check that client parts support all CPU features of
// the corresponding Oxide CPU platform before doing this "as-if" reporting.
// Lab systems built out of client parts may have hardware which support all
// features in the corresponding instance CPU platform, but have individual
// features disabled in the BIOS or by client part microcode. This can
// result in funky situations, like an Oxide CPU platform advertising CPU
// features that lab systems don't support. This is unlikely, but take
// AVX512 as an example: users can often disable AVX512 entirely on Zen 5
// BIOSes. In this case a VM on a 9000-series Ryzen will be told those
// instructions are available only for the guest to get #UD at runtime.
match family {
0x19 if model <= 0x0F => {
// This covers both Milan and Zen 3-based Threadrippers. I don't
// have a 5000-series Threadripper on hand to test but I believe
// they are feature-compatible.
CpuFamily::AmdMilan
}
0x19 if model >= 0x10 && model <= 0x1F => {
// This covers both Genoa and Zen 4-based Threadrippers. Again,
// don't have a comparable Threadripper to test here.
//
// We intend to expose Turin and Milan as families a guest can
// choose, skipping the Zen 4 EPYC parts. So, round this down to
// Milan; if we're here it's a lab system and the alternative is
// "unknown".
CpuFamily::AmdMilan
}
0x19 if model >= 0x20 && model <= 0x2F => {
// These are client Zen 3 parts aka Vermeer. Feature-wise, they are
// missing INVLPGB from Milan, but are otherwise close, and we don't
// expose INVLPGB to guests currently anyway.
CpuFamily::AmdMilan
}
0x19 if model >= 0x60 && model <= 0x6F => {
// These are client Zen 4 parts aka Raphael. Similar to the above
// with Genoa and Vermeer, round these down to Milan in support of
// lab clusters instead of calling them unknown.
CpuFamily::AmdMilan
}
0x1A if model <= 0x0F => CpuFamily::AmdTurin,
0x1A if model >= 0x10 && model <= 0x1F => {
// These are Turin Dense, but from a CPU feature perspective they're
// equivalently capable to Turin, so for our purposes they're the
// same.
CpuFamily::AmdTurin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensive documentation here is lovely!

Comment on lines 265 to 268
// These are Turin Dense, but from a CPU feature perspective they're
// equivalently capable to Turin, so for our purposes they're the
// same.
CpuFamily::AmdTurin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree --- I can imagine an operator wanting to know whether a particular rack consists of Turin or Turin Dense sleds (or if a rack contains a mix!), as long as we don't mind taking the extra logic to treat those as equivalent WRT virtual platform.

Comment on lines 44 to 45
pub fn cpu_family(&self) -> CpuFamily {
unimplemented!("Accessing hardware unsupported on non-illumos");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm technically we could do CPUID on any x86 system, but i agree that since we don't do anything else on non-illumos systems this is fine.

Comment on lines +1 to +2
ALTER TABLE omicron.public.sled ADD COLUMN IF NOT EXISTS
cpu_family omicron.public.sled_cpu_family NOT NULL DEFAULT 'unknown';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this means applying this migration will cause any currently-known sleds to be treated as "unknown" rather than "Milan" --- it occurs to me that we could perhaps make them Milan if baseboard is Gimlet? I dunno if this matters since this particular migration will run in a mupdate and the sleds will all report in again with the new sled-agent that has detected that they are Milan through the proper mechanisms...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the sleds will all report in again with the new sled-agent that has detected that they are Milan

they better be detected as Milan, at least. the situation you describe is how i expect upgrade to work out, so this should be fine, though i want to give that a spin just to see it be fine.

then, this only really matters in the case that sled-agent isn't detecting the sled's CPU family correctly, where defaulting to Milan would have the database look like sleds were Milan until they suddenly transmuted into Unknowns. that'd be much more confusing than "was unknown until we reported it otherwise" IMO

@iximeow iximeow requested a review from hawkw July 31, 2025 19:53
the existing plumbing was sufficient for sled-agent to report the CPU
family at startup, but did not provide the CPU family when Nexus calls
later for inventory collections. when you've upgraded to this version,
the database migration sets the sled CPU family to `unknown` expecting
that the next inventory collection will figure things out. this doesn't
happen, and the initial check-in doesn't update the CPU type either
(presumably because the sled is already known and initialized from the
control plane's perspective?)

this does... most of the plumbing to report a sled's CPU family for
inventory collection, but it doesn't actually work. `SledCpuFamily`
being both in `omicron-common` and `nexus-client` is kind of unworkable.
probably need a `ConvertInto` or something to transform the shared into
the `nexus-client` when needed..? i've been trying to figure out what
exactly is necessary and what is just building a mess for myself for two
hours and this feels like it's going nowhere.
@iximeow
Copy link
Member Author

iximeow commented Aug 2, 2025

i can't request changes of my own PR but this doesn't do the needful in the case that you upgrade into this version. from the WIP 5ec45d3:

the existing plumbing was sufficient for sled-agent to report the CPU
family at startup, but did not provide the CPU family when Nexus calls
later for inventory collections. when you've upgraded to this version,
the database migration sets the sled CPU family to unknown expecting
that the next inventory collection will figure things out. this doesn't
happen, and the initial check-in doesn't update the CPU type either
(presumably because the sled is already known and initialized from the
control plane's perspective?)

this does... most of the plumbing to report a sled's CPU family for
inventory collection, but it doesn't actually work. SledCpuFamily
being both in omicron-common and nexus-client is kind of unworkable.
probably need a ConvertInto or something to transform the shared into
the nexus-client when needed..? i've been trying to figure out what
exactly is necessary and what is just building a mess for myself for two
hours and this feels like it's going nowhere.

really didn't expect to spend today basically just chasing type errors but here we are i guess.

@iximeow
Copy link
Member Author

iximeow commented Aug 4, 2025

ah, i should note here:

Time: 3ms total (execution 3ms / network 1ms)

root@[fd00:1122:3344:101::3]:32221/omicron> select id, cpu_family from omicron.public.sled;
                   id                  | cpu_family
---------------------------------------+-------------
  2af2bf46-802e-4446-b162-565f3c21c439 | amd_milan
  8dcb5fa7-29fe-4e84-925b-37e141a1c786 | amd_milan
  cb9f8819-99ac-445d-97ec-fc33efc5e836 | amd_milan
  de03f1f1-3fbc-4d1f-8b84-88dfff0cdf17 | amd_milan
(4 rows)

this reported sleds as "unknown" before 5ec45d3 for reasons described in the commit (and copied above, here). now fresh installs and upgrades all report cpu_family correctly.

@hawkw
Copy link
Member

hawkw commented Aug 5, 2025

ah, i should note here:

Time: 3ms total (execution 3ms / network 1ms)

root@[fd00:1122:3344:101::3]:32221/omicron> select id, cpu_family from omicron.public.sled;
                   id                  | cpu_family
---------------------------------------+-------------
  2af2bf46-802e-4446-b162-565f3c21c439 | amd_milan
  8dcb5fa7-29fe-4e84-925b-37e141a1c786 | amd_milan
  cb9f8819-99ac-445d-97ec-fc33efc5e836 | amd_milan
  de03f1f1-3fbc-4d1f-8b84-88dfff0cdf17 | amd_milan
(4 rows)

this reported sleds as "unknown" before 5ec45d3 for reasons described in the commit (and copied above, here). now fresh installs and upgrades all report cpu_family correctly.

Not super important, but, seeing the manual DB query here, it occurs to me that we should definitely add a CPU family column to omdb db sleds, here --- especially because that command's output doesn't currently have any way to distinguish between Gimlet and Cosmo sleds, which we'll definitely want to fix in the near future.

Fine to do this in a follow-up PR if that makes your life easier, but we definitely should do it.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look good, modulo some smallish nits.

Comment on lines 1140 to 1143
SledCpuFamily::Unknown => write!(f, "unknown"),
SledCpuFamily::AmdMilan => write!(f, "milan"),
SledCpuFamily::AmdTurin => write!(f, "turin"),
SledCpuFamily::AmdTurinDense => write!(f, "turin_dense"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note (i'm not sure if you care) that this will format slightly differently than what Serde will serialize the enum as, since serde's formatting will be prefixed with amd_ for AmdMilan, AmdTurin, and AmdTurinDense --- do we care about making these the same?

Also, a super unimportant nit: for enums like this, I often like to add a pub fn as_str(&self) -> &'static str and then just make Display use that, so there's a way to get the static string out without allocating/formatting, in the handful of cases where you might just need an &str. Not necessary here if you're not using that, but it's just a thing I like to do so it's there when you want it and people don't .to_string() the enum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should have the amd_ prefix, yeah. the codenames are seared into my brain as "AMD" at this point so i didn't even realize i'd omitted it

Serialize, Deserialize, Copy, Clone, Debug, PartialEq, Eq, JsonSchema,
)]
#[serde(rename_all = "snake_case")]
pub enum SledCpuFamily {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, take it or leave it: is omicron-common really the best place for this, or should it just go in nexus-sled-agent-shared? More crates depend on omicron-common, and AFAICT, this is only needed by Nexus and sled-agent. I don't think it's super important to keep it out of omicron-common for compile-time reasons (it's just one enum, how much could it cost?), but it kinda seems like a better habit to put stuff that's only shared between Nexus and sled-agent in the crate that's for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i'd definitely had this in nexus-sled-agent-shared for a moment before getting very turned around by type errors downstream of modules re-exporting nexus-sled-agent-shared bits. i agree it'd make more sense there, so i'll give it another try. between then and now i deleted the copy of this internal to sled-agent which was another confounding element so it might be more straightforward now.

Copy link
Member Author

@iximeow iximeow Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, settled on this going (back!) to sled-hardware. the initial version here had a sled_hardware::SledCpuFamily, a copy with the same variants and more docs in nexus_types, and then the database enum in nexus_db_model. after all of this I've effectively just replaced the internal API type with a re-exported sled_hardware type.

I think the intention was that sled-agent's internal representation (for the fleeting time that we're in detect_cpu_family()) shouldn't be coupled to whatever we've got in the API, but I don't think we'll ever actually differ in those. we're dancing around the kernel's x86_processor_family_t here, where in the ideal world we're just scooping that out of libtopo and passing that to Nexus. so I don't feel too bad about having the same type all through here for internal purposes.

Comment on lines +225 to +228
"leaf1.eax" => format_args!("{:#08x}", leaf_1.eax),
"leaf1.ebx" => format_args!("{:#08x}", leaf_1.ebx),
"leaf1.ecx" => format_args!("{:#08x}", leaf_1.ecx),
"leaf1.edx" => format_args!("{:#08x}", leaf_1.edx),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely, thank you

Comment on lines 225 to 230
"leaf1.eax" => format_args!("{:#08x}", leaf_1.eax),
"leaf1.ebx" => format_args!("{:#08x}", leaf_1.ebx),
"leaf1.ecx" => format_args!("{:#08x}", leaf_1.ecx),
"leaf1.edx" => format_args!("{:#08x}", leaf_1.edx),
"parsed family" => format_args!("{family:#x}"),
"parsed model" => format_args!("{model:#x}"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turbo nit (i'm sorry): usually we don't use spaces (or dots) in slog field names, since they get turned into JSON fields by the logger --- while those characters are permitted in JSON, i think they're a bit impolite (they might force you to quote the field in jq and similar tools when you don't have to otherwise). normally we use underscores for these. not a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah totally should be a hyphen, ty

Comment on lines 1128 to 1134
/// AMD Turin processors (or very close). Could be an actual Turin in a
/// Cosmo, or a close-to-Turin client Zen 5 part.
AmdTurin,

/// AMD Turin Dense processors. There are no "Turin Dense-like" CPUs unlike
/// other cases, so this means a bona fide Zen 5c Turin Dense part.
AmdTurinDense,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i presume we're gonna want an is_turin() type thing eventually for virtual paltform compat...but not needed yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it turns into matches and arrays of enum variants (un?)fortunately, https://github.com/oxidecomputer/omicron/pull/8728/files#diff-45568580067c47bb34af16d07cb3c511f1c5252def068d02ffae6747a0050bc7R36-R42 (needs SledCpuFamily::TurinDense variants now but they just end up in the list)

this maybe gets less gross if we can do const appending of statics so there's a const TURIN_LIKES: [SledCpuFamily; 2] we can put into an array? but i haven't really looked yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this maybe gets less gross if we can do const appending of statics so there's a const TURIN_LIKES: [SledCpuFamily; 2] we can put into an array? but i haven't really looked yet.

for a lot of the DB enums, i've ended up doing stuff like that where there's a const array of variants that share a property, plus an is_whatever() method that checks if a variant is contained in one of those arrays, like this:

/// All VMM states.
pub const ALL_STATES: &'static [Self] =
<Self as strum::VariantArray>::VARIANTS;
/// States in which it is safe to deallocate a VMM's sled resources and mark
/// it as deleted.
pub const DESTROYABLE_STATES: &'static [Self] =
&[Self::Destroyed, Self::Failed, Self::SagaUnwound];
pub const TERMINAL_STATES: &'static [Self] =
&[Self::Destroyed, Self::Failed];
/// States in which a VMM record is present in the database but is not
/// resident on a sled, either because it does not yet exist, was produced
/// by an unwound update saga and will never exist, or has already been
/// destroyed.
pub const NONEXISTENT_STATES: &'static [Self] =
&[Self::Creating, Self::SagaUnwound, Self::Destroyed];
pub fn is_terminal(&self) -> bool {
Self::TERMINAL_STATES.contains(self)
}
/// Returns `true` if the instance is in a state in which it exists on a
/// sled.
pub fn exists_on_sled(&self) -> bool {
!Self::NONEXISTENT_STATES.contains(self)
}

That way, you can use the is_turin() check in Rust code, and write

.filter(dsl::cpu_family.eq_any(SledCpuFamily::TURIN_ESQUE))

or similar when writing Diesel queries.

I don't think that static arrays can be appended in const though, sadly.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, sorry for picking so many nits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants