-
Notifications
You must be signed in to change notification settings - Fork 101
GPU plugin: skip announcing parent GPU if MIG-enabled (fix #719) #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -144,21 +144,33 @@ func (l deviceLib) enumerateGpusAndMigDevices(config *Config) (AllocatableDevice | |
| return fmt.Errorf("error getting info for GPU %d: %w", i, err) | ||
| } | ||
|
|
||
| deviceInfo := &AllocatableDevice{ | ||
| parentdev := &AllocatableDevice{ | ||
| Gpu: gpuInfo, | ||
| } | ||
| devices[gpuInfo.CanonicalName()] = deviceInfo | ||
|
|
||
| migs, err := l.discoverMigDevicesByGPU(gpuInfo) | ||
| migdevs, err := l.discoverMigDevicesByGPU(gpuInfo) | ||
| if err != nil { | ||
| return fmt.Errorf("error discovering MIG devices for GPU %q: %w", gpuInfo.CanonicalName(), err) | ||
| } | ||
| if featuregates.Enabled(featuregates.PassthroughSupport) { | ||
| // If no MIG devices are found, allow VFIO devices. | ||
| gpuInfo.vfioEnabled = len(migs) == 0 | ||
| gpuInfo.vfioEnabled = len(migdevs) == 0 | ||
| } | ||
| for _, migDeviceInfo := range migs { | ||
| devices[migDeviceInfo.CanonicalName()] = migDeviceInfo | ||
|
|
||
| if !gpuInfo.migEnabled { | ||
| klog.Infof("Adding device %s to allocatable devices", gpuInfo.CanonicalName()) | ||
| devices[gpuInfo.CanonicalName()] = parentdev | ||
| return nil | ||
| } | ||
|
Comment on lines
+160
to
+164
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we not check this BEFORE we start iterating mig devices?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mean before ? Yes, that can make sense. (not required towards correct behavior, though, ack?) Again, not doing that here -- this section will change quite a bit again in upcoming patches. |
||
|
|
||
| // Likely unintentionally stranded capacity (misconfiguration). | ||
| if len(migdevs) == 0 { | ||
| klog.Warningf("Physical GPU %s has MIG mode enabled but no configured MIG devices", gpuInfo.CanonicalName()) | ||
| } | ||
|
|
||
| for _, mdev := range migdevs { | ||
| klog.Infof("Adding MIG device %s to allocatable devices (parent: %s)", mdev.CanonicalName(), gpuInfo.CanonicalName()) | ||
| devices[mdev.CanonicalName()] = mdev | ||
| } | ||
|
|
||
| return nil | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this unless the gpu does not have MIG enabled. Does it make sense to only instantate this in that return path? (not a blocker though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree!
(not doing this here; this section will change quite a bit again in upcoming patches; in this PR it's OK to just do something that gets the test to pass)