-
Notifications
You must be signed in to change notification settings - Fork 3
fix(vd): prevent VirtualDisk from stuck in WaitForFirstConsumer phase when VM is attached #1516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR refines the handling of WaitForFirstConsumer (WFFC) storage classes by fetching the StorageClass in relevant controllers, checking the DataVolumeRunning condition before setting the VirtualDisk phase, and updating watchers—all to prevent VirtualDisks from appearing stuck once a VM is attached and provisioning has started. Sequence diagram for VirtualDisk phase transition with WFFC after VM attachmentsequenceDiagram
participant VD as VirtualDisk Controller
participant SC as StorageClass
participant DV as DataVolume
participant VM as VirtualMachine
VD->SC: Fetch StorageClass for VD
SC-->>VD: Return StorageClass with WFFC mode
VD->DV: Check DataVolumeRunning condition
DV-->>VD: Return DataVolumeRunning status
VD->VM: Detect VM attachment to VD
VD->VD: Set phase to WaitForFirstConsumer only if DVRunning is false and reason is empty
VD->VD: Transition out of WaitForFirstConsumer if provisioning started
Class diagram for updated VirtualDisk phase handling logicclassDiagram
class VirtualDisk {
+Status: Phase, StorageClassName, Conditions
}
class StorageClass {
+VolumeBindingMode
}
class DataVolume {
+Status: Phase, Conditions
}
class BlockDeviceHandler {
+checkVirtualDisksToBeWFFC()
}
class WaitForDVStep {
+setForFirstConsumerIsAwaited()
}
VirtualDisk --> StorageClass : fetches
VirtualDisk --> DataVolume : checks DataVolumeRunning
BlockDeviceHandler --> VirtualDisk : checks phase
WaitForDVStep --> VirtualDisk : sets phase
WaitForDVStep --> DataVolume : checks DVRunning condition
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Daniil Loktev <[email protected]>
7cb3b44
to
1909f13
Compare
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
Workflow has started. The target step completed with status: failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `images/virtualization-artifact/pkg/controller/vm/internal/block_device_condition.go:60-66` </location>
<code_context>
for _, vd := range vds {
- if vd.Status.Phase == v1alpha2.DiskWaitForFirstConsumer {
- return true, nil
+ scName := vd.Status.StorageClassName
+ sc, err := object.FetchObject(ctx, types.NamespacedName{Name: scName}, h.client, &storagev1.StorageClass{})
+ if err != nil {
+ return false, fmt.Errorf("fetch storage class %s: %w", scName, err)
+ }
+
+ if sc != nil && sc.VolumeBindingMode != nil && *sc.VolumeBindingMode == storagev1.VolumeBindingWaitForFirstConsumer {
+ readyCondition, _ := conditions.GetCondition(vdcondition.ReadyType, vd.Status.Conditions)
+ if readyCondition.Status != metav1.ConditionTrue {
</code_context>
<issue_to_address>
**suggestion:** Consider handling missing or empty StorageClassName more explicitly.
If vd.Status.StorageClassName is empty, FetchObject will try to fetch a storage class with an empty name, which could cause errors or unnecessary log entries. Consider adding a check to handle this case before calling FetchObject.
</issue_to_address>
### Comment 2
<location> `images/virtualization-artifact/pkg/controller/vm/internal/block_device_condition.go:68` </location>
<code_context>
+ readyCondition, _ := conditions.GetCondition(vdcondition.ReadyType, vd.Status.Conditions)
</code_context>
<issue_to_address>
**issue (bug_risk):** Check for nil readyCondition before accessing Status.
Add a nil check for readyCondition before accessing its Status to prevent a potential panic.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
scName := vd.Status.StorageClassName | ||
sc, err := object.FetchObject(ctx, types.NamespacedName{Name: scName}, h.client, &storagev1.StorageClass{}) | ||
if err != nil { | ||
return false, fmt.Errorf("fetch storage class %s: %w", scName, err) | ||
} | ||
|
||
if sc != nil && sc.VolumeBindingMode != nil && *sc.VolumeBindingMode == storagev1.VolumeBindingWaitForFirstConsumer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider handling missing or empty StorageClassName more explicitly.
If vd.Status.StorageClassName is empty, FetchObject will try to fetch a storage class with an empty name, which could cause errors or unnecessary log entries. Consider adding a check to handle this case before calling FetchObject.
|
||
if sc != nil && sc.VolumeBindingMode != nil && *sc.VolumeBindingMode == storagev1.VolumeBindingWaitForFirstConsumer { | ||
readyCondition, _ := conditions.GetCondition(vdcondition.ReadyType, vd.Status.Conditions) | ||
if readyCondition.Status != metav1.ConditionTrue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Check for nil readyCondition before accessing Status.
Add a nil check for readyCondition before accessing its Status to prevent a potential panic.
for _, vd := range vds { | ||
if vd.Status.Phase == v1alpha2.DiskWaitForFirstConsumer { | ||
return true, nil | ||
scName := vd.Status.StorageClassName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check the SC type of the disk in VM? Why can't we trust the VD phase?
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
36aa92e
to
ed1e88c
Compare
This reverts commit 33dd5f1. Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
ed1e88c
to
99449cc
Compare
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
Workflow has started. The target step completed with status: success. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `images/virtualization-artifact/pkg/controller/vm/internal/block_device_condition.go:61` </location>
<code_context>
- if vd.Status.Phase == v1alpha2.DiskWaitForFirstConsumer {
- return true, nil
+ scName := vd.Status.StorageClassName
+ sc, err := object.FetchObject(ctx, types.NamespacedName{Name: scName}, h.client, &storagev1.StorageClass{})
+ if err != nil {
+ return false, fmt.Errorf("fetch storage class %s: %w", scName, err)
</code_context>
<issue_to_address>
**suggestion:** Consider handling the case where scName is empty before fetching the StorageClass.
A guard clause for empty scName would avoid unnecessary API calls and reduce confusion from attempting to fetch a StorageClass with no name.
</issue_to_address>
### Comment 2
<location> `images/virtualization-artifact/pkg/controller/vd/internal/source/step/wait_for_dv_step.go:138` </location>
<code_context>
return false, fmt.Errorf("get sc: %w", err)
}
+ dvRunningCond, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, s.dv.Status.Conditions)
isWFFC := sc != nil && sc.VolumeBindingMode != nil && *sc.VolumeBindingMode == storagev1.VolumeBindingWaitForFirstConsumer
- if isWFFC && (s.dv.Status.Phase == cdiv1.PendingPopulation || s.dv.Status.Phase == cdiv1.WaitForFirstConsumer) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider checking for nil dvRunningCond before accessing its fields.
Accessing dvRunningCond.Status or dvRunningCond.Reason without a nil check may cause a panic if GetDataVolumeCondition returns nil. Please add a nil check to prevent this.
</issue_to_address>
### Comment 3
<location> `images/virtualization-artifact/pkg/controller/vd/internal/source/sources.go:185` </location>
<code_context>
}
- if isStorageClassWFFC(sc) && (dv.Status.Phase == cdiv1.PendingPopulation || dv.Status.Phase == cdiv1.WaitForFirstConsumer) {
+
+ dvRunningCond, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, dv.Status.Conditions)
+ if isStorageClassWFFC(sc) && (dv.Status.Phase == cdiv1.PendingPopulation || dv.Status.Phase == cdiv1.WaitForFirstConsumer) && dvRunningCond.Status == corev1.ConditionFalse && dvRunningCond.Reason == "" {
vd.Status.Phase = v1alpha2.DiskWaitForFirstConsumer
</code_context>
<issue_to_address>
**issue (bug_risk):** Missing nil check for dvRunningCond could lead to panics.
Add a nil check for dvRunningCond before accessing its fields to prevent potential panics.
</issue_to_address>
### Comment 4
<location> `images/virtualization-artifact/pkg/controller/vd/internal/watcher/datavolume_watcher.go:76-79` </location>
<code_context>
return true
}
+ oldDVRunning, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, e.ObjectOld.Status.Conditions)
+ newDVRunning, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, e.ObjectNew.Status.Conditions)
+
+ if oldDVRunning.Reason != newDVRunning.Reason {
+ return true
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential nil dereference if oldDVRunning or newDVRunning is nil.
Add nil checks for oldDVRunning and newDVRunning before accessing their Reason fields to prevent panics.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if vd.Status.Phase == v1alpha2.DiskWaitForFirstConsumer { | ||
return true, nil | ||
scName := vd.Status.StorageClassName | ||
sc, err := object.FetchObject(ctx, types.NamespacedName{Name: scName}, h.client, &storagev1.StorageClass{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider handling the case where scName is empty before fetching the StorageClass.
A guard clause for empty scName would avoid unnecessary API calls and reduce confusion from attempting to fetch a StorageClass with no name.
return false, fmt.Errorf("get sc: %w", err) | ||
} | ||
|
||
dvRunningCond, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, s.dv.Status.Conditions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Consider checking for nil dvRunningCond before accessing its fields.
Accessing dvRunningCond.Status or dvRunningCond.Reason without a nil check may cause a panic if GetDataVolumeCondition returns nil. Please add a nil check to prevent this.
} | ||
if isStorageClassWFFC(sc) && (dv.Status.Phase == cdiv1.PendingPopulation || dv.Status.Phase == cdiv1.WaitForFirstConsumer) { | ||
|
||
dvRunningCond, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, dv.Status.Conditions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Missing nil check for dvRunningCond could lead to panics.
Add a nil check for dvRunningCond before accessing its fields to prevent potential panics.
oldDVRunning, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, e.ObjectOld.Status.Conditions) | ||
newDVRunning, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, e.ObjectNew.Status.Conditions) | ||
|
||
if oldDVRunning.Reason != newDVRunning.Reason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential nil dereference if oldDVRunning or newDVRunning is nil.
Add nil checks for oldDVRunning and newDVRunning before accessing their Reason fields to prevent panics.
Description
Fix VirtualDisk remaining in
WaitForFirstConsumer
phase even after VM attachment and provisioning has started.Why do we need it, and what problem does it solve?
When using WFFC storage class with volume populators:
WaitForFirstConsumer
waiting for VMWaitForFirstConsumer
because DataVolume is inPendingPopulation
state, even though the "first consumer" (VM) already existsThis creates perception of "hanging" - users see VD stuck in WFFC for minutes while provisioning is actually running.
What is the expected result?
Checklist
Changelog entries