Add DMC tests and extra Windows guest tool tests #348

dinhngtu · 2025-09-11T16:53:45Z

Add "Clean up cached VM even if specified from UUID", which changes how VMs are cleaned up if specified by UUID.

Aside from the new DMC tests, the Windows tests were also enhanced with some tests that were previously failure-prone (upgrades, suspend with emulated NVMe, device ID changes, VIF unplug)

Requires WinPV 9.0.9135 or later.

dinhngtu · 2025-09-18T11:31:30Z

Added a rework of test_checkpoint and associated functions to avoid repeated failures on Windows.

lib/vm.py

stormi · 2025-09-19T08:53:10Z

tests/misc/test_dmc.py

+    snapshot.revert()
+
+
+@pytest.mark.small_vm


Would it be useful to also test it with a variety of VMs?

Good point, I can mark it multi_vms. But I'm not sure if all of these VMs properly support ballooning.

RHEL 8 & 9 ones don't, if I remember correctly. Can we detect that and skip for uncooperative VMs?

Added check for other/feature-balloon. However, this check won't work correctly on Linux VMs and current XCP-ng WinPV VMs, since in both cases the guest agent insists on setting feature-balloon regardless of driver support. I've added a fix in the WinPV guest agent, but due to this issue, I'll leave it marked small_vm for now.

How often does the guest agent currently set feature-balloon despite drivers not being there? Is this a realistic scenario?

Isn't that the case with the "recent" RHEL guests mentioned above?

Yes, unfortunately this is a problem with all Linux guests. The Linux balloon driver doesn't set feature-balloon, so it's up to the guest agent to do that. I don't know if there's a way to check if the balloon driver is enabled, but at least the Rust agent doesn't do any such checks.

On this the Rust agent just mimicked what the XS one does. Worth a Plane (+gitlab?) ticket?

Plane card created. OTOH Gitlab ticket can't really be created (IMO) until the current refactor situation is sorted out.

tests/misc/test_dmc.py

tests/guest_tools/win/test_guest_tools_win.py

stormi · 2025-09-19T09:00:56Z

tests/guest_tools/win/test_guest_tools_win.py

+            vm.suspend()
+        wait_for_vm_running_and_ssh_up_without_tools(vm)
+
+    def test_toggle_device_id(self, running_unsealed_windows_vm: VM, guest_tools_iso: dict[str, Any]):


What's the objective of this test? I understand we want to make sure the VM still boots after changing the device ID, but why?

It's a test of our driver, which after the unplug rework must remain activated even if the device ID changes. It also serves as a proxy for device ID changes if the Windows Update option was toggled. It's not an exact reproduction of the situation, but since we don't yet support the C200 device, it's good enough.

Can you add a comment above the test function?

Done. Also moved the device ID assert up one line.

lib/vm.py

last-genius

Great work - both with the new tests and fixing up old tests to be more reliable. Looks good to me from the xapi point of view as a starting point for DMC testing.

tests/misc/test_dmc.py

last-genius · 2025-09-29T08:22:11Z

tests/misc/test_dmc.py

+    def test_dmc_suspend(self, vm_with_memory_limits: VM):
+        """Suspend a VM with DMC enabled."""
+        vm = vm_with_memory_limits
+        self.start_dmc_vm(vm)
+        vm.set_memory_target(MEMORY_TARGET_LOW)
+        wait_for_vm_balloon_finished(vm)
+        vm.suspend(verify=True)
+        vm.resume()
+        vm.wait_for_vm_running_and_ssh_up()


All of the tests here set dynamic-min and dynamic-max to be the same value (oscillating between LOW and HIGH), that's what set_memory_target does. Do we plan on having tests with dynamic-min set lower than dynamic-max (not in this PR, but in the future)? Would be great to test how squeezed redistributes memory between VMs dynamically/how VMs are ballooned down to dynamic-min on migrations (but not on "localhost migrations" anymore).

I don't know how such scenarios will behave (i.e. what should we test?) so I'll need your input on that.

last-genius · 2025-09-29T08:24:14Z

tests/misc/test_dmc.py

+    snapshot.revert()
+
+
+@pytest.mark.small_vm


How often does the guest agent currently set feature-balloon despite drivers not being there? Is this a realistic scenario?

dinhngtu · 2025-09-29T09:01:55Z

How often does the guest agent currently set feature-balloon despite drivers not being there? Is this a realistic scenario?

Yes, unfortunately this is a problem with all Linux guests. The Linux balloon driver doesn't set feature-balloon, so it's up to the guest agent to do that. I don't know if there's a way to check if the balloon driver is enabled, but at least the Rust agent doesn't do any such checks.

dinhngtu · 2025-10-02T08:39:40Z

Backed out the test_checkpoint rework since the SSH failures on Windows need more investigation. I'll merge later today provided there are no more comments.

lib/host.py

tests/misc/test_dmc.py

ydirson · 2025-10-02T08:59:08Z

tests/misc/test_dmc.py

+
+
+@pytest.fixture(scope="module")
+def imported_vm_and_snapshot(imported_vm: VM):


non-obvious fixture needs docstring

tests/misc/test_dmc.py

ydirson · 2025-10-02T09:04:28Z

tests/misc/test_dmc.py

+def wait_for_vm_balloon_finished(vm: VM):
+    memory_target = int(vm.param_get("memory-target"))


This seem to be possibly subject to a race condition: nothing seems to ensure that the param cannot get changed behind the test's back and that we indeed get the expected value here. Looks like that target should rather be passed as a parameter to the function.

Why would the parameter change behind the test's back?

Well, my comment was not 100% on the spot. But this parameter is RO, so likely based on the dynamic ranges, so the race is rather "how can we be sure that parameter has been set to the value we should be expecting?"
Intuitively, I would expect that the target would be set by squeezed to ensure the dynamic aspect of things - if that's right, it would even be expected that it changes behind our back.

But then, in the existence of vm-memory-target-set raises doubts on my interpretation above.

On a different note, vm-memory-target-wait, would look like a candidate for replacing DmcMemoryTracker?

I was not able to locate a dedicated doc for the DMC feature, so maybe we need that at some point, and in the meantime I guess more explanations about what we expect and test would help understanding this PR :)

Indeed that's why vm-memory-target-set was used, and why I wasn't sure of how to test the situation where dynamic-min and dynamic-max are different. (vm-memory-target-set, despite the name, sets both dynamic-min and dynamic-max)

vm-memory-target-wait looks interesting, but it doesn't have a way to bail out. I'm not sure how it reports failure either. Could you give me a quick explanation of how it works, @last-genius? I can either use it directly or replicate its logic here.

It waits for abs (memory_actual - memory_target) <= tolerance for up to 256 seconds, where tolerance=1MB. Sadly it doesn't have a way to provide the timeout or tolerance parameters, but I can add that if you want to.
The errors it reports are VM_MEMORY_TARGET_WAIT_TIMEOUT and TASK_CANCELLED (which is how you can cancel any task, with xe task-cancel, but it's pretty awkward with xe, much easier with the API directly).

I also wonder why vm-memory-target-wait is hidden from the CLI help (so it's not autocompleted 🤔 )

Thanks. I've opted to replicate the logic you described in DmcMemoryTracker.

tests/misc/test_dmc.py

ydirson · 2025-10-02T09:26:02Z

tests/guest_tools/win/test_guest_tools_win.py

-    def test_drivers_detected(self, vm_install_test_tools_per_test_class: VM):
+    def test_vif_replug(self, vm_install_test_tools_per_test_class: VM):
        vm = vm_install_test_tools_per_test_class
-        assert vm.are_windows_tools_working()


Wouldn't it make sense to have that assert systematically inside the vm_install_test_tools_per_test_class fixture? I think it would help by having the test in ERROR in case that happens, and not even starting, instead of going FAIL later when the problem is not really with what the tests checks.

Then maybe for test_drivers_detected an _unchecked version of the fixture would help so that one test does go FAIL.

It sounds like a very roundabout solution for little gain.

ydirson · 2025-10-02T09:27:32Z

tests/guest_tools/win/test_guest_tools_win.py

+        vifs = vm.vifs()
+        for vif in vifs:
+            vif.unplug()
+            # HACK: Allow some time for the unplug to settle. If not, Windows guests have a tendency to explode.


Do we have a ticket for that explosion?

No, there isn't one, only a problem revealed during debugging. It's already being tracked internally.

tests/guest_tools/win/test_guest_tools_win.py

If CACHE_IMPORTED_VM is specified, the source VM is unconditionally cloned, even if it was referred to by UUID. Clean that up during teardown. Signed-off-by: Tu Dinh <[email protected]>

Otherwise you can't pass a dict[str, str] to host.xe, as mypy complained here: lib/vm.py:875: error: Argument 2 to "xe" of "Host" has incompatible type "dict[str, str]"; expected "dict[str, str | bool]" [arg-type] lib/vm.py:875: note: "dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance lib/vm.py:875: note: Consider using "Mapping" instead, which is covariant in the value type Signed-off-by: Tu Dinh <[email protected]>

Signed-off-by: Tu Dinh <[email protected]>

These tests verify a VM's responsiveness to memory target changes, and checks for several suspend bugs when DMC is enabled. Signed-off-by: Tu Dinh <[email protected]>

Signed-off-by: Tu Dinh <[email protected]>

Remove duplicate test_tools_after_reboot which was no longer used. Reenable upgrade tests. Add suspend test with emulated NVMe. Add device ID toggle test. Add VIF replug test. Signed-off-by: Tu Dinh <[email protected]>

These methods help test VIF functionalities and the offboarding process. Signed-off-by: Tu Dinh <[email protected]>

In some edge cases, Xeniface may not have been initialized after installation, and so vm.reboot() will not work. Signed-off-by: Tu Dinh <[email protected]>

This is flaky and needs to be explicitly tested. Use DNS as a basic, inoffensive setting that won't interfere with VM operation. Signed-off-by: Tu Dinh <[email protected]>

RSS enablement is flaky and needs to be explicitly tested. Signed-off-by: Tu Dinh <[email protected]>

The default timeouts turned out to be insufficient for driver installs in some cases. Signed-off-by: Tu Dinh <[email protected]>

Xenvif offboard will reset the NIC, which will cause any running SSH commands to fail. Signed-off-by: Tu Dinh <[email protected]>

dinhngtu · 2025-10-22T08:55:33Z

Added various fixes for the tools-windows tests. These fixes are coupled with several other fixes in the installer, drivers and guest agent themselves.

dinhngtu force-pushed the dnt/windows-tests branch from 0cf55ea to 338b0b1 Compare September 11, 2025 16:57

dinhngtu marked this pull request as ready for review September 11, 2025 17:02

dinhngtu force-pushed the dnt/windows-tests branch from 295f52d to 629017e Compare September 18, 2025 11:31

dinhngtu requested review from stormi and ydirson September 18, 2025 11:31

stormi reviewed Sep 19, 2025

View reviewed changes

dinhngtu force-pushed the dnt/windows-tests branch 3 times, most recently from b1670f7 to e58b6c2 Compare September 19, 2025 13:55

dinhngtu requested a review from last-genius September 19, 2025 14:00

last-genius approved these changes Sep 29, 2025

View reviewed changes

dinhngtu force-pushed the dnt/windows-tests branch from e58b6c2 to 588f3ae Compare September 29, 2025 08:59

dinhngtu force-pushed the dnt/windows-tests branch from 588f3ae to 0195214 Compare October 2, 2025 08:38

ydirson requested changes Oct 2, 2025

View reviewed changes

dinhngtu force-pushed the dnt/windows-tests branch 4 times, most recently from 7271c97 to 8027905 Compare October 2, 2025 16:22

dinhngtu added 7 commits October 20, 2025 12:23

Clean up cached VM even if specified from UUID

68e0646

If CACHE_IMPORTED_VM is specified, the source VM is unconditionally cloned, even if it was referred to by UUID. Clean that up during teardown. Signed-off-by: Tu Dinh <[email protected]>

lib/vm: Add functions for setting memory targets

46414fc

Signed-off-by: Tu Dinh <[email protected]>

Add DMC tests

88d72b5

These tests verify a VM's responsiveness to memory target changes, and checks for several suspend bugs when DMC is enabled. Signed-off-by: Tu Dinh <[email protected]>

test_guest_tools_win: Add typing

2cc1730

Signed-off-by: Tu Dinh <[email protected]>

lib/vif: Add VIF plug/unplug methods

dd26f0f

Signed-off-by: Tu Dinh <[email protected]>

Update Windows guest tool tests

b478670

Remove duplicate test_tools_after_reboot which was no longer used. Reenable upgrade tests. Add suspend test with emulated NVMe. Add device ID toggle test. Add VIF replug test. Signed-off-by: Tu Dinh <[email protected]>

dinhngtu force-pushed the dnt/windows-tests branch from 8027905 to 1dc06f1 Compare October 21, 2025 20:32

dinhngtu added 6 commits October 22, 2025 10:48

windows: Add Windows VIF helper methods

e774e16

These methods help test VIF functionalities and the offboarding process. Signed-off-by: Tu Dinh <[email protected]>

windows: Use shutdown command in test_uninstall_tools

02529e4

In some edge cases, Xeniface may not have been initialized after installation, and so vm.reboot() will not work. Signed-off-by: Tu Dinh <[email protected]>

windows: Test NIC setting restoration on uninstall

ac1a1f8

This is flaky and needs to be explicitly tested. Use DNS as a basic, inoffensive setting that won't interfere with VM operation. Signed-off-by: Tu Dinh <[email protected]>

windows: Test RSS enablement

d42d006

RSS enablement is flaky and needs to be explicitly tested. Signed-off-by: Tu Dinh <[email protected]>

windows: Extend installation timeouts

85bc29d

The default timeouts turned out to be insufficient for driver installs in some cases. Signed-off-by: Tu Dinh <[email protected]>

windows: Wait for Xenvif offboard after uninstall

12fb0d2

Xenvif offboard will reset the NIC, which will cause any running SSH commands to fail. Signed-off-by: Tu Dinh <[email protected]>

dinhngtu force-pushed the dnt/windows-tests branch from 1dc06f1 to 12fb0d2 Compare October 22, 2025 08:50



		@pytest.fixture(scope="module")
		def imported_vm_and_snapshot(imported_vm: VM):

		def wait_for_vm_balloon_finished(vm: VM):
		memory_target = int(vm.param_get("memory-target"))

		snapshot.revert()


		@pytest.mark.small_vm

		snapshot.revert()


		@pytest.mark.small_vm

Add DMC tests and extra Windows guest tool tests #348

Are you sure you want to change the base?

Add DMC tests and extra Windows guest tool tests #348

Uh oh!

Conversation

dinhngtu commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dinhngtu commented Sep 18, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dinhngtu Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

last-genius left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dinhngtu commented Sep 29, 2025

Uh oh!

dinhngtu commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydirson Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dinhngtu Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

dinhngtu commented Sep 11, 2025 •

edited

Loading

dinhngtu Oct 2, 2025 •

edited

Loading

dinhngtu commented Oct 2, 2025 •

edited

Loading

ydirson Oct 2, 2025 •

edited

Loading

dinhngtu Oct 2, 2025 •

edited

Loading

dinhngtu Oct 2, 2025 •

edited

Loading