Skip to content

Conversation

loktev-d
Copy link
Contributor

@loktev-d loktev-d commented Sep 30, 2025

Description

Fix VirtualDisk remaining in WaitForFirstConsumer phase even after VM attachment and provisioning has started.

Why do we need it, and what problem does it solve?

When using WFFC storage class with volume populators:

  1. VD transitions to WaitForFirstConsumer waiting for VM
  2. VM is created and attached to VD
  3. Volume provisioning starts (importer pod running)
  4. Issue: VD controller continues setting phase to WaitForFirstConsumer because DataVolume is in PendingPopulation state, even though the "first consumer" (VM) already exists

This creates perception of "hanging" - users see VD stuck in WFFC for minutes while provisioning is actually running.

What is the expected result?

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: vd
type: fix
summary: VirtualDisk no longer stuck in WaitForFirstConsumer phase after VM attachment.

@loktev-d loktev-d added this to the v1.2.0 milestone Sep 30, 2025
Copy link
Contributor

sourcery-ai bot commented Sep 30, 2025

Reviewer's Guide

This PR refines the handling of WaitForFirstConsumer (WFFC) storage classes by fetching the StorageClass in relevant controllers, checking the DataVolumeRunning condition before setting the VirtualDisk phase, and updating watchers—all to prevent VirtualDisks from appearing stuck once a VM is attached and provisioning has started.

Sequence diagram for VirtualDisk phase transition with WFFC after VM attachment

sequenceDiagram
    participant VD as VirtualDisk Controller
    participant SC as StorageClass
    participant DV as DataVolume
    participant VM as VirtualMachine
    VD->SC: Fetch StorageClass for VD
    SC-->>VD: Return StorageClass with WFFC mode
    VD->DV: Check DataVolumeRunning condition
    DV-->>VD: Return DataVolumeRunning status
    VD->VM: Detect VM attachment to VD
    VD->VD: Set phase to WaitForFirstConsumer only if DVRunning is false and reason is empty
    VD->VD: Transition out of WaitForFirstConsumer if provisioning started
Loading

Class diagram for updated VirtualDisk phase handling logic

classDiagram
    class VirtualDisk {
        +Status: Phase, StorageClassName, Conditions
    }
    class StorageClass {
        +VolumeBindingMode
    }
    class DataVolume {
        +Status: Phase, Conditions
    }
    class BlockDeviceHandler {
        +checkVirtualDisksToBeWFFC()
    }
    class WaitForDVStep {
        +setForFirstConsumerIsAwaited()
    }
    VirtualDisk --> StorageClass : fetches
    VirtualDisk --> DataVolume : checks DataVolumeRunning
    BlockDeviceHandler --> VirtualDisk : checks phase
    WaitForDVStep --> VirtualDisk : sets phase
    WaitForDVStep --> DataVolume : checks DVRunning condition
Loading

File-Level Changes

Change Details Files
Refine WFFC phase logic across controllers
  • Import storagev1 API and fetch StorageClass in block_device_condition handler
  • Skip phase checks when no StorageClassName is set
  • Enhance wait_for_dv_step and sources to verify DataVolumeRunningCondition status and reason before setting DiskWaitForFirstConsumer
  • Update datavolume_watcher to trigger on changes to the DVRunningCondition reason
internal/block_device_condition.go
internal/source/step/wait_for_dv_step.go
internal/watcher/datavolume_watcher.go
internal/source/sources.go
Extend unit tests to cover WFFC with populators
  • Add storagev1 to scheme and a getWFFCStorageClass helper in block_devices_test.go
  • Set StorageClassName on test VirtualDisk fixtures
  • Inject DataVolumeRunning false condition and VolumeBindingWaitForFirstConsumer in object_ref_cvi_test.go and object_ref_vi_test.go
internal/block_devices_test.go
internal/source/object_ref_cvi_test.go
internal/source/object_ref_vi_test.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@loktev-d loktev-d marked this pull request as draft September 30, 2025 17:15
Signed-off-by: Daniil Loktev <[email protected]>
@loktev-d loktev-d force-pushed the fix/vd/wffc-stuck-in-waiting-phase branch from 7cb3b44 to 1909f13 Compare October 3, 2025 13:25
loktev-d and others added 4 commits October 3, 2025 16:28
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
@loktev-d loktev-d added the e2e/run Run e2e test on cluster of PR author label Oct 3, 2025
@deckhouse-BOaTswain
Copy link
Contributor

deckhouse-BOaTswain commented Oct 3, 2025

Workflow has started.
Follow the progress here: Workflow Run

The target step completed with status: failure.

@deckhouse-BOaTswain deckhouse-BOaTswain removed the e2e/run Run e2e test on cluster of PR author label Oct 3, 2025
@loktev-d loktev-d marked this pull request as ready for review October 6, 2025 08:23
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `images/virtualization-artifact/pkg/controller/vm/internal/block_device_condition.go:60-66` </location>
<code_context>
 	for _, vd := range vds {
-		if vd.Status.Phase == v1alpha2.DiskWaitForFirstConsumer {
-			return true, nil
+		scName := vd.Status.StorageClassName
+		sc, err := object.FetchObject(ctx, types.NamespacedName{Name: scName}, h.client, &storagev1.StorageClass{})
+		if err != nil {
+			return false, fmt.Errorf("fetch storage class %s: %w", scName, err)
+		}
+
+		if sc != nil && sc.VolumeBindingMode != nil && *sc.VolumeBindingMode == storagev1.VolumeBindingWaitForFirstConsumer {
+			readyCondition, _ := conditions.GetCondition(vdcondition.ReadyType, vd.Status.Conditions)
+			if readyCondition.Status != metav1.ConditionTrue {
</code_context>

<issue_to_address>
**suggestion:** Consider handling missing or empty StorageClassName more explicitly.

If vd.Status.StorageClassName is empty, FetchObject will try to fetch a storage class with an empty name, which could cause errors or unnecessary log entries. Consider adding a check to handle this case before calling FetchObject.
</issue_to_address>

### Comment 2
<location> `images/virtualization-artifact/pkg/controller/vm/internal/block_device_condition.go:68` </location>
<code_context>
+		readyCondition, _ := conditions.GetCondition(vdcondition.ReadyType, vd.Status.Conditions)
</code_context>

<issue_to_address>
**issue (bug_risk):** Check for nil readyCondition before accessing Status.

Add a nil check for readyCondition before accessing its Status to prevent a potential panic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +60 to +66
scName := vd.Status.StorageClassName
sc, err := object.FetchObject(ctx, types.NamespacedName{Name: scName}, h.client, &storagev1.StorageClass{})
if err != nil {
return false, fmt.Errorf("fetch storage class %s: %w", scName, err)
}

if sc != nil && sc.VolumeBindingMode != nil && *sc.VolumeBindingMode == storagev1.VolumeBindingWaitForFirstConsumer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider handling missing or empty StorageClassName more explicitly.

If vd.Status.StorageClassName is empty, FetchObject will try to fetch a storage class with an empty name, which could cause errors or unnecessary log entries. Consider adding a check to handle this case before calling FetchObject.


if sc != nil && sc.VolumeBindingMode != nil && *sc.VolumeBindingMode == storagev1.VolumeBindingWaitForFirstConsumer {
readyCondition, _ := conditions.GetCondition(vdcondition.ReadyType, vd.Status.Conditions)
if readyCondition.Status != metav1.ConditionTrue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Check for nil readyCondition before accessing Status.

Add a nil check for readyCondition before accessing its Status to prevent a potential panic.

@LopatinDmitr LopatinDmitr self-requested a review October 8, 2025 09:24
for _, vd := range vds {
if vd.Status.Phase == v1alpha2.DiskWaitForFirstConsumer {
return true, nil
scName := vd.Status.StorageClassName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check the SC type of the disk in VM? Why can't we trust the VD phase?

@Isteb4k Isteb4k modified the milestones: v1.2.0, v1.1.1 Oct 14, 2025
@loktev-d loktev-d marked this pull request as draft October 16, 2025 08:23
@nevermarine nevermarine modified the milestones: v1.1.1, v1.2.0 Oct 16, 2025
loktev-d and others added 3 commits October 16, 2025 17:08
@loktev-d loktev-d force-pushed the fix/vd/wffc-stuck-in-waiting-phase branch from 36aa92e to ed1e88c Compare October 16, 2025 17:32
This reverts commit 33dd5f1.

Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
@loktev-d loktev-d force-pushed the fix/vd/wffc-stuck-in-waiting-phase branch from ed1e88c to 99449cc Compare October 16, 2025 17:36
Signed-off-by: Daniil Loktev <[email protected]>
Signed-off-by: Daniil Loktev <[email protected]>
@loktev-d loktev-d added the e2e/run Run e2e test on cluster of PR author label Oct 16, 2025
@deckhouse-BOaTswain
Copy link
Contributor

deckhouse-BOaTswain commented Oct 16, 2025

Workflow has started.
Follow the progress here: Workflow Run

The target step completed with status: success.

@deckhouse-BOaTswain deckhouse-BOaTswain removed the e2e/run Run e2e test on cluster of PR author label Oct 16, 2025
@loktev-d loktev-d marked this pull request as ready for review October 17, 2025 07:25
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `images/virtualization-artifact/pkg/controller/vm/internal/block_device_condition.go:61` </location>
<code_context>
-		if vd.Status.Phase == v1alpha2.DiskWaitForFirstConsumer {
-			return true, nil
+		scName := vd.Status.StorageClassName
+		sc, err := object.FetchObject(ctx, types.NamespacedName{Name: scName}, h.client, &storagev1.StorageClass{})
+		if err != nil {
+			return false, fmt.Errorf("fetch storage class %s: %w", scName, err)
</code_context>

<issue_to_address>
**suggestion:** Consider handling the case where scName is empty before fetching the StorageClass.

A guard clause for empty scName would avoid unnecessary API calls and reduce confusion from attempting to fetch a StorageClass with no name.
</issue_to_address>

### Comment 2
<location> `images/virtualization-artifact/pkg/controller/vd/internal/source/step/wait_for_dv_step.go:138` </location>
<code_context>
 		return false, fmt.Errorf("get sc: %w", err)
 	}

+	dvRunningCond, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, s.dv.Status.Conditions)
 	isWFFC := sc != nil && sc.VolumeBindingMode != nil && *sc.VolumeBindingMode == storagev1.VolumeBindingWaitForFirstConsumer
-	if isWFFC && (s.dv.Status.Phase == cdiv1.PendingPopulation || s.dv.Status.Phase == cdiv1.WaitForFirstConsumer) {
</code_context>

<issue_to_address>
**issue (bug_risk):** Consider checking for nil dvRunningCond before accessing its fields.

Accessing dvRunningCond.Status or dvRunningCond.Reason without a nil check may cause a panic if GetDataVolumeCondition returns nil. Please add a nil check to prevent this.
</issue_to_address>

### Comment 3
<location> `images/virtualization-artifact/pkg/controller/vd/internal/source/sources.go:185` </location>
<code_context>
 		}
-		if isStorageClassWFFC(sc) && (dv.Status.Phase == cdiv1.PendingPopulation || dv.Status.Phase == cdiv1.WaitForFirstConsumer) {
+
+		dvRunningCond, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, dv.Status.Conditions)
+		if isStorageClassWFFC(sc) && (dv.Status.Phase == cdiv1.PendingPopulation || dv.Status.Phase == cdiv1.WaitForFirstConsumer) && dvRunningCond.Status == corev1.ConditionFalse && dvRunningCond.Reason == "" {
 			vd.Status.Phase = v1alpha2.DiskWaitForFirstConsumer
</code_context>

<issue_to_address>
**issue (bug_risk):** Missing nil check for dvRunningCond could lead to panics.

Add a nil check for dvRunningCond before accessing its fields to prevent potential panics.
</issue_to_address>

### Comment 4
<location> `images/virtualization-artifact/pkg/controller/vd/internal/watcher/datavolume_watcher.go:76-79` </location>
<code_context>
 						return true
 					}

+					oldDVRunning, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, e.ObjectOld.Status.Conditions)
+					newDVRunning, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, e.ObjectNew.Status.Conditions)
+
+					if oldDVRunning.Reason != newDVRunning.Reason {
+						return true
+					}
</code_context>

<issue_to_address>
**issue (bug_risk):** Potential nil dereference if oldDVRunning or newDVRunning is nil.

Add nil checks for oldDVRunning and newDVRunning before accessing their Reason fields to prevent panics.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

if vd.Status.Phase == v1alpha2.DiskWaitForFirstConsumer {
return true, nil
scName := vd.Status.StorageClassName
sc, err := object.FetchObject(ctx, types.NamespacedName{Name: scName}, h.client, &storagev1.StorageClass{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider handling the case where scName is empty before fetching the StorageClass.

A guard clause for empty scName would avoid unnecessary API calls and reduce confusion from attempting to fetch a StorageClass with no name.

return false, fmt.Errorf("get sc: %w", err)
}

dvRunningCond, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, s.dv.Status.Conditions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Consider checking for nil dvRunningCond before accessing its fields.

Accessing dvRunningCond.Status or dvRunningCond.Reason without a nil check may cause a panic if GetDataVolumeCondition returns nil. Please add a nil check to prevent this.

}
if isStorageClassWFFC(sc) && (dv.Status.Phase == cdiv1.PendingPopulation || dv.Status.Phase == cdiv1.WaitForFirstConsumer) {

dvRunningCond, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, dv.Status.Conditions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Missing nil check for dvRunningCond could lead to panics.

Add a nil check for dvRunningCond before accessing its fields to prevent potential panics.

Comment on lines +76 to +79
oldDVRunning, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, e.ObjectOld.Status.Conditions)
newDVRunning, _ := conditions.GetDataVolumeCondition(conditions.DVRunningConditionType, e.ObjectNew.Status.Conditions)

if oldDVRunning.Reason != newDVRunning.Reason {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Potential nil dereference if oldDVRunning or newDVRunning is nil.

Add nil checks for oldDVRunning and newDVRunning before accessing their Reason fields to prevent panics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants