-
Notifications
You must be signed in to change notification settings - Fork 50
make sleds report their CPU families to Nexus #8725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
RFD 505 proposes that instances should be able to set a "minimum hardware platform" or "minimum CPU platform" that allows users to constrain an instance to run on sleds that have a specific set of CPU features available. This allows a user to opt a VM into advanced hardware features (e.g. AVX-512 support) by constraining it to run only on sleds that support those features. For this to work, Nexus needs to understand what CPUs are present in which sleds. Have sled-agent query CPUID to get CPU vendor and family information and report this to Nexus as part of the sled hardware manifest.
// In lab and development environments in particular, the family reported here | ||
// may differ from the real processor family. `sled-hardware::detect_cpu_family` | ||
// tries to map various CPUs that we would not ship in a rack to their | ||
// greatest-common-denominator family names here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs on the external enum here is asymmetric with the docs on the internal enum. My thinking is that for API clients, usage will overwhelmingly be deployed racks with Gimlets and Cosmos and Whatever Comes After, where there really should never be a CPU that is not accurately identified by the families here.
The hedging about client parts and CPUs that Nexus could see but we don't ship would only be confusing here, but I think it's useful for internal APIs as the set of supported CPUs grows.
sled-hardware/src/lib.rs
Outdated
// These are Turin Dense, but from a CPU feature perspective they're | ||
// equivalently capable to Turin, so for our purposes they're the | ||
// same. | ||
CpuFamily::AmdTurin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably could/should be a TurinDense
variant that happens to be compatible with the Turin "CPU platform" in the forthcoming instance CPU platforms PR (adding this to the Sled external view was a relatively late addition and I'm realizing that it would be pretty confusing for a rack with Turin Dense CPUs to be reported as just "Turin"..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree --- I can imagine an operator wanting to know whether a particular rack consists of Turin or Turin Dense sleds (or if a rack contains a mix!), as long as we don't mind taking the extra logic to treat those as equivalent WRT virtual platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, #8730 talks about it more and i've backed out the external API bits here but i'm leaving the TurinDense
variant in - this is really pretty insufficient for operator questions ("Turin" might mean "Turin High Frequency Part" and "Turin Dense" doesn't tell you if it's "Turin With Twelve Core CCDs" or "Turin With Sixteen Core CCDs")
you might ask, "why keep the distinction if this is just about CPU feautres?" and my answer is mostly hedging against weirder futures. Turin and Turin Dense are each as able to run a "requires Turin" VM, based solely on feature sets, but I could imagine a scenario where one family has even more AVX512 "AI Acceleration" while its sibling family doesn't. Since Nexus is otherwise responsible for "can this VM run on that set of CPU features" it sled-agent
ought to say as much as is reasonable in the context of real Oxide hardware..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good to me, I left a handful of little notes.
// "AuthenticAMD"; see AMD APM volume 3 (March 2024) section E.3.1. | ||
(0x68747541, 0x444D4163, 0x69746E65) => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wanted to try and come up with a way to to_le_bytes
/from_le_bytes
this so you could match on b"AuthenticAMD"
but it ends up looking kinda tortured and i don't actually care that much, carry on as you were --- the hex numbers are funnier anyway.
sled-hardware/src/lib.rs
Outdated
// TODO(?): Exhaustively check that client parts support all CPU features of | ||
// the corresponding Oxide CPU platform before doing this "as-if" reporting. | ||
// Lab systems built out of client parts may have hardware which support all | ||
// features in the corresponding instance CPU platform, but have individual | ||
// features disabled in the BIOS or by client part microcode. This can | ||
// result in funky situations, like an Oxide CPU platform advertising CPU | ||
// features that lab systems don't support. This is unlikely, but take | ||
// AVX512 as an example: users can often disable AVX512 entirely on Zen 5 | ||
// BIOSes. In this case a VM on a 9000-series Ryzen will be told those | ||
// instructions are available only for the guest to get #UD at runtime. | ||
match family { | ||
0x19 if model <= 0x0F => { | ||
// This covers both Milan and Zen 3-based Threadrippers. I don't | ||
// have a 5000-series Threadripper on hand to test but I believe | ||
// they are feature-compatible. | ||
CpuFamily::AmdMilan | ||
} | ||
0x19 if model >= 0x10 && model <= 0x1F => { | ||
// This covers both Genoa and Zen 4-based Threadrippers. Again, | ||
// don't have a comparable Threadripper to test here. | ||
// | ||
// We intend to expose Turin and Milan as families a guest can | ||
// choose, skipping the Zen 4 EPYC parts. So, round this down to | ||
// Milan; if we're here it's a lab system and the alternative is | ||
// "unknown". | ||
CpuFamily::AmdMilan | ||
} | ||
0x19 if model >= 0x20 && model <= 0x2F => { | ||
// These are client Zen 3 parts aka Vermeer. Feature-wise, they are | ||
// missing INVLPGB from Milan, but are otherwise close, and we don't | ||
// expose INVLPGB to guests currently anyway. | ||
CpuFamily::AmdMilan | ||
} | ||
0x19 if model >= 0x60 && model <= 0x6F => { | ||
// These are client Zen 4 parts aka Raphael. Similar to the above | ||
// with Genoa and Vermeer, round these down to Milan in support of | ||
// lab clusters instead of calling them unknown. | ||
CpuFamily::AmdMilan | ||
} | ||
0x1A if model <= 0x0F => CpuFamily::AmdTurin, | ||
0x1A if model >= 0x10 && model <= 0x1F => { | ||
// These are Turin Dense, but from a CPU feature perspective they're | ||
// equivalently capable to Turin, so for our purposes they're the | ||
// same. | ||
CpuFamily::AmdTurin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensive documentation here is lovely!
sled-hardware/src/lib.rs
Outdated
// These are Turin Dense, but from a CPU feature perspective they're | ||
// equivalently capable to Turin, so for our purposes they're the | ||
// same. | ||
CpuFamily::AmdTurin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree --- I can imagine an operator wanting to know whether a particular rack consists of Turin or Turin Dense sleds (or if a rack contains a mix!), as long as we don't mind taking the extra logic to treat those as equivalent WRT virtual platform.
sled-hardware/src/non_illumos/mod.rs
Outdated
pub fn cpu_family(&self) -> CpuFamily { | ||
unimplemented!("Accessing hardware unsupported on non-illumos"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm technically we could do CPUID on any x86 system, but i agree that since we don't do anything else on non-illumos systems this is fine.
ALTER TABLE omicron.public.sled ADD COLUMN IF NOT EXISTS | ||
cpu_family omicron.public.sled_cpu_family NOT NULL DEFAULT 'unknown'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this means applying this migration will cause any currently-known sleds to be treated as "unknown" rather than "Milan" --- it occurs to me that we could perhaps make them Milan if baseboard is Gimlet? I dunno if this matters since this particular migration will run in a mupdate and the sleds will all report in again with the new sled-agent that has detected that they are Milan through the proper mechanisms...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the sleds will all report in again with the new sled-agent that has detected that they are Milan
they better be detected as Milan, at least. the situation you describe is how i expect upgrade to work out, so this should be fine, though i want to give that a spin just to see it be fine.
then, this only really matters in the case that sled-agent isn't detecting the sled's CPU family correctly, where defaulting to Milan would have the database look like sleds were Milan until they suddenly transmuted into Unknowns. that'd be much more confusing than "was unknown until we reported it otherwise" IMO
the existing plumbing was sufficient for sled-agent to report the CPU family at startup, but did not provide the CPU family when Nexus calls later for inventory collections. when you've upgraded to this version, the database migration sets the sled CPU family to `unknown` expecting that the next inventory collection will figure things out. this doesn't happen, and the initial check-in doesn't update the CPU type either (presumably because the sled is already known and initialized from the control plane's perspective?) this does... most of the plumbing to report a sled's CPU family for inventory collection, but it doesn't actually work. `SledCpuFamily` being both in `omicron-common` and `nexus-client` is kind of unworkable. probably need a `ConvertInto` or something to transform the shared into the `nexus-client` when needed..? i've been trying to figure out what exactly is necessary and what is just building a mess for myself for two hours and this feels like it's going nowhere.
i can't
really didn't expect to spend today basically just chasing type errors but here we are i guess. |
ah, i should note here:
this reported sleds as "unknown" before 5ec45d3 for reasons described in the commit (and copied above, here). now fresh installs and upgrades all report |
Not super important, but, seeing the manual DB query here, it occurs to me that we should definitely add a CPU family column to Fine to do this in a follow-up PR if that makes your life easier, but we definitely should do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look good, modulo some smallish nits.
common/src/api/internal/shared.rs
Outdated
SledCpuFamily::Unknown => write!(f, "unknown"), | ||
SledCpuFamily::AmdMilan => write!(f, "milan"), | ||
SledCpuFamily::AmdTurin => write!(f, "turin"), | ||
SledCpuFamily::AmdTurinDense => write!(f, "turin_dense"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note (i'm not sure if you care) that this will format slightly differently than what Serde will serialize the enum as, since serde's formatting will be prefixed with amd_
for AmdMilan
, AmdTurin
, and AmdTurinDense
--- do we care about making these the same?
Also, a super unimportant nit: for enums like this, I often like to add a pub fn as_str(&self) -> &'static str
and then just make Display
use that, so there's a way to get the static string out without allocating/formatting, in the handful of cases where you might just need an &str
. Not necessary here if you're not using that, but it's just a thing I like to do so it's there when you want it and people don't .to_string()
the enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should have the amd_
prefix, yeah. the codenames are seared into my brain as "AMD" at this point so i didn't even realize i'd omitted it
common/src/api/internal/shared.rs
Outdated
Serialize, Deserialize, Copy, Clone, Debug, PartialEq, Eq, JsonSchema, | ||
)] | ||
#[serde(rename_all = "snake_case")] | ||
pub enum SledCpuFamily { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, take it or leave it: is omicron-common
really the best place for this, or should it just go in nexus-sled-agent-shared
? More crates depend on omicron-common
, and AFAICT, this is only needed by Nexus and sled-agent. I don't think it's super important to keep it out of omicron-common
for compile-time reasons (it's just one enum, how much could it cost?), but it kinda seems like a better habit to put stuff that's only shared between Nexus and sled-agent in the crate that's for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i'd definitely had this in nexus-sled-agent-shared
for a moment before getting very turned around by type errors downstream of modules re-exporting nexus-sled-agent-shared
bits. i agree it'd make more sense there, so i'll give it another try. between then and now i deleted the copy of this internal to sled-agent
which was another confounding element so it might be more straightforward now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, settled on this going (back!) to sled-hardware
. the initial version here had a sled_hardware::SledCpuFamily
, a copy with the same variants and more docs in nexus_types
, and then the database enum in nexus_db_model
. after all of this I've effectively just replaced the internal API type with a re-exported sled_hardware
type.
I think the intention was that sled-agent's internal representation (for the fleeting time that we're in detect_cpu_family()
) shouldn't be coupled to whatever we've got in the API, but I don't think we'll ever actually differ in those. we're dancing around the kernel's x86_processor_family_t
here, where in the ideal world we're just scooping that out of libtopo
and passing that to Nexus. so I don't feel too bad about having the same type all through here for internal purposes.
"leaf1.eax" => format_args!("{:#08x}", leaf_1.eax), | ||
"leaf1.ebx" => format_args!("{:#08x}", leaf_1.ebx), | ||
"leaf1.ecx" => format_args!("{:#08x}", leaf_1.ecx), | ||
"leaf1.edx" => format_args!("{:#08x}", leaf_1.edx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lovely, thank you
sled-hardware/src/lib.rs
Outdated
"leaf1.eax" => format_args!("{:#08x}", leaf_1.eax), | ||
"leaf1.ebx" => format_args!("{:#08x}", leaf_1.ebx), | ||
"leaf1.ecx" => format_args!("{:#08x}", leaf_1.ecx), | ||
"leaf1.edx" => format_args!("{:#08x}", leaf_1.edx), | ||
"parsed family" => format_args!("{family:#x}"), | ||
"parsed model" => format_args!("{model:#x}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turbo nit (i'm sorry): usually we don't use spaces (or dots) in slog
field names, since they get turned into JSON fields by the logger --- while those characters are permitted in JSON, i think they're a bit impolite (they might force you to quote the field in jq
and similar tools when you don't have to otherwise). normally we use underscores for these. not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah totally should be a hyphen, ty
common/src/api/internal/shared.rs
Outdated
/// AMD Turin processors (or very close). Could be an actual Turin in a | ||
/// Cosmo, or a close-to-Turin client Zen 5 part. | ||
AmdTurin, | ||
|
||
/// AMD Turin Dense processors. There are no "Turin Dense-like" CPUs unlike | ||
/// other cases, so this means a bona fide Zen 5c Turin Dense part. | ||
AmdTurinDense, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i presume we're gonna want an is_turin()
type thing eventually for virtual paltform compat...but not needed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it turns into matches and arrays of enum variants (un?)fortunately, https://github.com/oxidecomputer/omicron/pull/8728/files#diff-45568580067c47bb34af16d07cb3c511f1c5252def068d02ffae6747a0050bc7R36-R42 (needs SledCpuFamily::TurinDense
variants now but they just end up in the list)
this maybe gets less gross if we can do const
appending of statics so there's a const TURIN_LIKES: [SledCpuFamily; 2]
we can put into an array? but i haven't really looked yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this maybe gets less gross if we can do
const
appending of statics so there's aconst TURIN_LIKES: [SledCpuFamily; 2]
we can put into an array? but i haven't really looked yet.
for a lot of the DB enums, i've ended up doing stuff like that where there's a const array of variants that share a property, plus an is_whatever()
method that checks if a variant is contained in one of those arrays, like this:
omicron/nexus/db-model/src/vmm_state.rs
Lines 55 to 81 in fb4c1f7
/// All VMM states. | |
pub const ALL_STATES: &'static [Self] = | |
<Self as strum::VariantArray>::VARIANTS; | |
/// States in which it is safe to deallocate a VMM's sled resources and mark | |
/// it as deleted. | |
pub const DESTROYABLE_STATES: &'static [Self] = | |
&[Self::Destroyed, Self::Failed, Self::SagaUnwound]; | |
pub const TERMINAL_STATES: &'static [Self] = | |
&[Self::Destroyed, Self::Failed]; | |
/// States in which a VMM record is present in the database but is not | |
/// resident on a sled, either because it does not yet exist, was produced | |
/// by an unwound update saga and will never exist, or has already been | |
/// destroyed. | |
pub const NONEXISTENT_STATES: &'static [Self] = | |
&[Self::Creating, Self::SagaUnwound, Self::Destroyed]; | |
pub fn is_terminal(&self) -> bool { | |
Self::TERMINAL_STATES.contains(self) | |
} | |
/// Returns `true` if the instance is in a state in which it exists on a | |
/// sled. | |
pub fn exists_on_sled(&self) -> bool { | |
!Self::NONEXISTENT_STATES.contains(self) | |
} |
That way, you can use the is_turin()
check in Rust code, and write
.filter(dsl::cpu_family.eq_any(SledCpuFamily::TURIN_ESQUE))
or similar when writing Diesel queries.
I don't think that static arrays can be appended in const though, sadly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, sorry for picking so many nits
RFD 505 proposes that instances should be able to set a "minimum hardware platform" or "minimum CPU platform" that allows users to constrain an instance to run on sleds that have a specific set of CPU features available. This allows a user to opt a VM into advanced hardware features (e.g. AVX-512 support) by constraining it to run only on sleds that support those features. For this to work, Nexus needs to understand what CPUs are present in which sleds.
Have sled-agent query CPUID to get CPU vendor and family information and report this to Nexus as part of the sled hardware manifest.
This is largely code Greg had put together that, now that the control plane bits that build on it are imminently to be PR'd, I'd like to get merged. Initially it was oriented around
AmdFamilyXXh
, but I've adjusted towards specific family names since there is precedent for one family (19h) which has very different features (Zen 4-based parts have AVX512, Zen 3-based parts do not). Additionally, client vs server parts sometimes are segmented along feature lines in annoying ways - not such that VMs would see a difference, but over in Intel land that does happen. If times were to so substantially change...I mostly mention this because if that rationale is not clear from the code and comments, please say so!