From 9908720fc52881c8f98ef54592f343395300a0c4 Mon Sep 17 00:00:00 2001
From: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Date: Mon, 17 Nov 2025 12:25:21 -0800
Subject: [PATCH 1/7] Simplify cancellation
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
---
docs/cancellation.md | 439 +++++++++++
.../src/hypervisor/hyperv_linux.rs | 131 +---
.../src/hypervisor/hyperv_windows.rs | 198 +----
src/hyperlight_host/src/hypervisor/kvm.rs | 128 +---
src/hyperlight_host/src/hypervisor/mod.rs | 699 +++++++++---------
.../src/sandbox/initialized_multi_use.rs | 42 +-
src/hyperlight_host/tests/integration_test.rs | 2 +-
7 files changed, 835 insertions(+), 804 deletions(-)
create mode 100644 docs/cancellation.md
diff --git a/docs/cancellation.md b/docs/cancellation.md
new file mode 100644
index 000000000..a8b00aa99
--- /dev/null
+++ b/docs/cancellation.md
@@ -0,0 +1,439 @@
+# Cancellation in Hyperlight
+
+This document describes the cancellation mechanism and memory ordering guarantees for Hyperlight.
+
+## Overview (Linux)
+
+Hyperlight provides a mechanism to forcefully interrupt guest execution through the `InterruptHandle::kill()` method. This involves coordination between multiple threads using atomic operations and POSIX signals to ensure safe and reliable cancellation.
+
+## Key Components
+
+### LinuxInterruptHandle State
+
+The `LinuxInterruptHandle` uses a packed atomic u64 to track execution state:
+
+- **state (AtomicU64)**: Packs two bits:
+ - **Bit 1 (RUNNING_BIT)**: Set when vCPU is actively running in guest mode
+ - **Bit 0 (CANCEL_BIT)**: Set when cancellation has been requested via `kill()`
+- **tid (AtomicU64)**: Thread ID where the vCPU is running
+- **debug_interrupt (AtomicBool)**: Set when debugger interrupt is requested (gdb feature only)
+- **dropped (AtomicBool)**: Set when the corresponding VM has been dropped
+
+The packed state enables atomic reads of both RUNNING_BIT and CANCEL_BIT simultaneously via `get_running_and_cancel()`. Within a single `run()` call, the CANCEL_BIT remains set across vcpu exits and re-entries (such as when calling host functions), ensuring cancellation persists until the guest call completes. However, `clear_cancel()` resets the CANCEL_BIT at the beginning of each new `run()` call, preventing cancellation requests from affecting subsequent guest function calls.
+
+### Signal Mechanism
+
+On Linux, Hyperlight uses `SIGRTMIN + offset` (configurable, default offset is typically 0) to interrupt the vCPU thread. The signal handler is intentionally a no-op - the signal's only purpose is to cause a VM exit via `EINTR` from the `ioctl` call that runs the vCPU.
+
+## Run Loop Flow
+
+The main execution loop in `HyperlightVm::run()` coordinates vCPU execution with potential interrupts. Here's the detailed flow:
+
+```mermaid
+sequenceDiagram
+ participant VM as run() Loop
+ participant Guest as vCPU (Guest)
+ participant IH as InterruptHandle
+
+ Note over VM: === TIMING POINT 1 ===
+ VM->>IH: clear_cancel()
+ Note right of VM: Clear premature kill() requests
+
+ loop Run Loop
+ Note over VM: === TIMING POINT 2 ===
+ VM->>IH: set_tid()
+ Note right of VM: Store current thread ID
+ VM->>IH: set_running()
+ Note right of VM: Set running=true
+
+ VM->>IH: is_cancelled()
+
+ alt is_cancelled() == true
+ VM->>VM: return Cancelled()
+ else is_cancelled() == false
+ Note over VM: === TIMING POINT 3 ===
+ VM->>Guest: run_vcpu()
+ activate Guest
+ Note right of Guest: Guest code executes in vCPU
+
+ alt Guest completes normally
+ Guest-->>VM: VmExit::Halt()
+ else Guest performs I/O
+ Guest-->>VM: VmExit::IoOut()/MmioRead()
+ else Signal received
+ Guest-->>VM: VmExit::Cancelled()
+ end
+ deactivate Guest
+ end
+
+ Note over VM: === TIMING POINT 4 ===
+ VM->>IH: is_cancelled()
+ IH-->>VM: cancel_requested (bool)
+ Note right of VM: Capture for filtering stale signals later
+
+ Note over VM: === TIMING POINT 5 ===
+ VM->>IH: clear_running()
+ Note right of VM: Clear RUNNING_BIT
+
+ Note over VM: === TIMING POINT 6 ===
+
+ alt Exit reason is Halt
+ VM->>VM: break Ok(())
+ else Exit reason is Cancelled AND cancel_requested==true
+ VM->>VM: break Err(ExecutionCanceledByHost)
+ else Exit reason is Cancelled AND cancel_requested==false
+ Note right of VM: Stale signal, retry
+ VM->>VM: continue (retry iteration)
+ else Exit reason is I/O or host call
+ VM->>VM: Handle and continue loop
+ else Other exit reasons
+ VM->>VM: Handle appropriately
+ end
+ end
+```
+
+### Detailed Run Loop Steps
+
+1. **Timing Point 1** - Between guest function calls:
+ - `clear_cancel()` is called to clear any stale CANCEL_BIT
+ - If `kill()` completes before this point, it has NO effect on this call
+ - Ensures that `kill()` between different guest function calls doesn't affect the next call
+
+2. **Timing Point 2** - Before entering run loop iteration:
+ - `set_tid()` stores the current thread ID
+ - `set_running()` sets running to true
+ - If `kill()` completes before this, early `Cancelled()` is returned
+
+3. **Timing Point 3** - Before calling `run_vcpu()`:
+ - If `kill()` completes before this, CANCEL_BIT is set but too late to prevent entering guest
+ - Signals will interrupt the guest (RUNNING_BIT=true), causing `VmExit::Cancelled()`
+ - If guest completes before signals arrive, `kill()` may have no effect on this iteration
+
+4. **Timing Point 4** - After vCPU exits, before capturing `cancel_requested`:
+ - CANCEL_BIT is captured for filtering stale signals
+ - If `kill()` completes before this, CANCEL_BIT persists for next iteration
+
+5. **Timing Point 5** - Before calling `clear_running()`:
+ - Same as point 4
+
+6. **Timing Point 6** - After calling `clear_running()`:
+ - RUNNING_BIT is now false, no new signals will be sent
+ - CANCEL_BIT may be set but won't affect this iteration
+ - Stale signals may arrive but are filtered by the `cancel_requested` check
+
+## Kill Operation Flow
+
+The `kill()` operation involves setting the CANCEL_BIT and sending signals to interrupt the vCPU:
+
+```mermaid
+sequenceDiagram
+ participant Caller as Caller Thread
+ participant IH as InterruptHandle
+ participant Signal as Signal Delivery
+ participant VM as vCPU Thread
+
+ Caller->>IH: kill()
+ activate IH
+
+ IH->>IH: fetch_or(CANCEL_BIT, Release)
+ Note right of IH: Atomically set cancel=true with Release ordering
+
+ IH->>IH: send_signal()
+ activate IH
+
+ loop Retry Loop
+ IH->>IH: get_running_and_cancel()
+ Note right of IH: Load with Acquire ordering
+
+ alt Not running OR not cancelled
+ IH-->>IH: break (sent_signal=false/true)
+ else Running AND cancelled
+ IH->>IH: tid.load(Acquire)
+ IH->>Signal: pthread_kill(tid, SIGRTMIN+offset)
+ activate Signal
+ Note right of Signal: Send signal to vCPU thread
+ Signal->>VM: SIGRTMIN+offset delivered
+ Note right of VM: Signal handler is no-op Purpose is to cause EINTR
+ deactivate Signal
+
+ alt Signal arrives during ioctl
+ VM->>VM: ioctl returns EINTR
+ VM->>VM: return VmExit::Cancelled()
+ else Signal arrives between ioctls
+ Note right of VM: Signal is harmless
+ end
+
+ IH->>IH: sleep(retry_delay)
+ Note right of IH: Default 500μs between retries
+ end
+ end
+
+ deactivate IH
+ IH-->>Caller: sent_signal
+ deactivate IH
+```
+
+### Kill Operation Steps
+
+1. **Set Cancel Flag**: Atomically set the CANCEL_BIT using `fetch_or(CANCEL_BIT)` with `Release` ordering
+ - Ensures all writes before `kill()` are visible when vCPU thread checks `is_cancelled()` with `Acquire`
+
+2. **Send Signals**: Enter retry loop via `send_signal()`
+ - Atomically load both running and cancel flags via `get_running_and_cancel()` with `Acquire` ordering
+ - Continue if `running=true AND cancel=true` (or `running=true AND debug_interrupt=true` with gdb)
+ - Exit loop immediately if `running=false OR cancel=false`
+
+3. **Signal Delivery**: Send `SIGRTMIN+offset` via `pthread_kill`
+ - Signal interrupts the `ioctl` that runs the vCPU, causing `EINTR`
+ - Signal handler is intentionally a no-op
+ - Returns `VmExit::Cancelled()` when `EINTR` is received
+
+4. **Loop Termination**: The signal loop terminates when:
+ - vCPU is no longer running (`running=false`), OR
+ - Cancellation is no longer requested (`cancel=false`)
+ - See the loop termination proof in the source code for rigorous correctness analysis
+
+## Memory Ordering Guarantees
+
+### Release-Acquire Semantics Overview
+
+A **synchronizes-with** relationship is established when:
+1. Thread A performs an atomic operation with `Release` ordering that writes a value
+2. Thread B performs an atomic operation with `Acquire` ordering on the same atomic variable
+3. Thread B's `Acquire` load reads the exact value that Thread A's `Release` operation wrote
+
+When this occurs, all memory operations that happened-before the `Release` in Thread A become visible to Thread B after the `Acquire`. This creates a **happens-before** relationship that ensures memory consistency across threads.
+
+### Synchronization in Hyperlight
+
+Hyperlight uses careful memory ordering to ensure correctness across threads:
+
+```mermaid
+graph TB
+ subgraph "vCPU Thread"
+ A[set_tid Store tid with Release]
+ B[set_running fetch_update RUNNING_BIT with Release]
+ C[is_cancelled Load with Acquire]
+ D[clear_running fetch_and with Release]
+ end
+
+ subgraph "Interrupt Thread"
+ E[kill fetch_or CANCEL_BIT with Release]
+ F[send_signal Load running with Acquire]
+ G[Load tid with Acquire]
+ H[pthread_kill]
+ end
+
+ B -->|Synchronizes-with| F
+ A -->|Happens-before via B→F| G
+ E -->|Synchronizes-with| C
+ D -->|Synchronizes-with| F
+```
+
+### Ordering Rules
+
+1. **tid Store → running Load Synchronization**:
+ - `set_tid()`: Stores `tid` with `Release` ordering
+ - `set_running()`: Sets RUNNING_BIT with `Release` ordering (via `fetch_or`)
+ - `send_signal()`: Loads `state` with `Acquire` ordering via `get_running_and_cancel()`
+ - **Guarantee**: When interrupt thread observes RUNNING_BIT=true, it sees the correct `tid` value
+
+2. **CANCEL_BIT Synchronization**:
+ - `kill()`: Sets CANCEL_BIT with `Release` ordering (via `fetch_or`)
+ - `is_cancelled()`: Loads `state` with `Acquire` ordering
+ - **Guarantee**: When vCPU thread observes CANCEL_BIT=true, it sees all writes before `kill()`
+
+3. **clear_running Synchronization**:
+ - `clear_running()`: Clears RUNNING_BIT with `Release` ordering (via `fetch_and`)
+ - `send_signal()`: Loads `state` with `Acquire` ordering via `get_running_and_cancel()`
+ - **Guarantee**: When interrupt thread observes RUNNING_BIT=false, all vCPU operations are complete
+
+4. **clear_cancel Synchronization**:
+ - `clear_cancel()`: Clears CANCEL_BIT with `Release` ordering (via `fetch_and`)
+ - **Rationale**: Uses Release because the VM can move between threads across guest calls, ensuring operations from previous run() are visible to other threads
+
+5. **dropped flag**:
+ - `set_dropped()`: Uses `Release` ordering
+ - `dropped()`: Uses `Acquire` ordering
+ - **Guarantee**: All VM cleanup operations are visible when `dropped()` returns true
+
+### Happens-Before Relationships
+
+```mermaid
+sequenceDiagram
+ participant VT as vCPU Thread
+ participant IT as Interrupt Thread
+
+ VT->>VT: set_tid() (Release)
+ Note right of VT: Store thread ID
+
+ VT->>VT: set_running() (Release)
+ Note right of VT: Set running=true
+
+ Note over VT,IT: Sync #1: set_running(Release) → send_signal(Acquire)
+ VT-->>IT: synchronizes-with
+ Note over IT: send_signal()
+ IT->>IT: get_running_and_cancel() (Acquire)
+ Note right of IT: Atomically load both bits Observes running=true
+
+ IT->>IT: Load tid (Acquire)
+ Note right of IT: Sees correct tid value (from set_tid Release)
+
+ VT->>VT: run_vcpu()
+ Note right of VT: Guest executes
+
+ IT->>IT: pthread_kill()
+ Note right of IT: Send signal to tid
+
+ par Concurrent Operations
+ Note over IT: kill()
+ IT->>IT: fetch_or(CANCEL_BIT, Release)
+ Note right of IT: Atomically set CANCEL_BIT
+ and
+ Note over VT: Guest interrupted
+ VT->>VT: is_cancelled() (Acquire)
+ Note right of VT: Observes cancel=true Sees all writes before kill()
+ end
+
+ Note over IT,VT: Sync #2: kill(Release) → is_cancelled(Acquire)
+ IT-->>VT: synchronizes-with
+
+ VT->>VT: clear_running() (Release)
+ Note right of VT: fetch_and to clear RUNNING_BIT All vCPU ops complete
+
+ Note over VT,IT: Sync #3: clear_running(Release) → send_signal(Acquire)
+ VT-->>IT: synchronizes-with
+
+ IT->>IT: send_signal() observes
+ Note right of IT: running=false Stop sending signals
+```
+
+## Interaction with Host Function Calls
+
+When a guest performs a host function call, the vCPU exits and the host function executes with `RUNNING_BIT=false`, preventing signal delivery during host execution. The `CANCEL_BIT` persists across this exit and re-entry, so if `kill()` was called, cancellation will be detected when the guest attempts to resume execution. This ensures cancellation takes effect even if it occurs during a host call, while avoiding signals during non-guest code execution.
+
+## Signal Behavior Across Loop Iterations
+
+When the run loop iterates (e.g., for host calls or IO operations):
+
+1. Before host call: `clear_running()` sets `running=false`
+2. `send_signal()` loop checks `running && cancel` - exits immediately when `running=false`
+3. After host call: `set_running()` sets `running=true` again
+4. `is_cancelled()` check detects persistent `cancel` flag and returns early
+
+**Key insight**: The `running && cancel` check is sufficient. When `running` becomes false (host call starts), the signal loop exits immediately. When the vCPU would resume, the early `is_cancelled()` check catches the persistent `cancel` flag before entering the guest.
+
+**Signal Chaining Note**: Hyperlight does not provide signal chaining for `SIGRTMIN+offset`. Since Hyperlight may issue signals back-to-back during cancellation retry loop, it's unlikely embedders want to handle these signals.
+
+## Race Conditions and Edge Cases
+
+### Race 1: kill() called between guest function calls
+
+```
+Timeline:
+t1: Guest function #1 completes, run() returns
+t2: kill() is called (sets CANCEL_BIT)
+t3: Guest function #2 starts, run() is called
+t4: clear_cancel() clears CANCEL_BIT
+
+Result: Guest function #2 executes normally (not cancelled)
+```
+
+**This is by design** - cancellation is scoped to a single guest function call.
+
+### Race 2: kill() called just before run_vcpu()
+
+```
+Timeline:
+t1: set_running() sets RUNNING_BIT
+t2: kill() sets CANCEL_BIT and sends signals
+t3: run_vcpu() enters guest
+
+Result: Signals interrupt the guest, causing VmExit::Cancelled()
+```
+
+**Handled correctly** - signals cause VM exit.
+
+### Race 3: Guest completes before signal arrives
+
+```
+Timeline:
+t1: kill() sets CANCEL_BIT and sends signal
+t2: Guest completes naturally
+t3: clear_running() clears RUNNING_BIT
+t4: Signal arrives (too late)
+
+Result: If guest completes normally (Halt), returns Ok()
+ If guest exits for I/O, next iteration will be cancelled
+```
+
+**Acceptable behavior** - cancellation is best-effort.
+
+### Race 4: Stale signals from previous guest function call
+
+```
+Timeline:
+t1: Guest function #1: kill() sends signals, CANCEL_BIT=true
+t2: Guest function #1: VM exits with Halt, clear_running() clears RUNNING_BIT
+t3: Guest function #2: run() called, clear_cancel() clears CANCEL_BIT
+t4: Guest function #2: set_running() sets RUNNING_BIT
+t5: Stale signal from guest #1 arrives, causes VM to exit with Cancelled
+t6: cancel_requested=false (CANCEL_BIT was cleared at step 3)
+t7: Cancelled exit is filtered as stale, iteration continues
+
+Result: The signal was sent for guest function #1, but arrives during guest function #2.
+ Since cancel_requested is false, we know this cancellation wasn't intended
+ for the current guest call, so we continue the loop (retry).
+```
+
+**Handled correctly** - The `cancel_requested` flag (captured at timing point 4) distinguishes between:
+- Signals intended for the current guest call (`cancel_requested=true`) → return error
+- Stale signals from a previous guest call (`cancel_requested=false`) → filter and retry
+
+### Race 5: ABA Problem
+
+The ABA problem (where a new guest-call starts during the InterruptHandle's `send_signal()` loop, potentially causing the loop to send signals to a different guest call) is prevented by clearing CANCEL_BIT at the start of each `run()` call, ensuring each guest call starts with a clean cancellation state. This breaks out any ongoing slow `send_signal()` loops from previous calls that did not have time to observe the cleared CANCEL_BIT after the first `run()` call completed.
+
+## Windows Platform Differences
+
+While the core cancellation mechanism follows the same conceptual model on Windows, there are several platform-specific differences in implementation:
+
+### WindowsInterruptHandle Structure
+
+The `WindowsInterruptHandle` uses a simpler structure compared to Linux:
+
+- **state (AtomicU64)**: Packs the same two bits (RUNNING_BIT and CANCEL_BIT)
+- **debug_interrupt (AtomicBool)**: Set when debugger interrupt is requested (gdb feature only)
+- **partition_handle**: Windows Hyper-V partition handle for the VM
+- **dropped (AtomicBool)**: Set when the corresponding VM has been dropped
+
+**Key difference**: No `tid` field is needed because Windows doesn't use thread-targeted signals. No `retry_delay` or `sig_rt_min_offset` fields are needed.
+
+### Kill Operation Differences
+
+On Windows, the `kill()` method uses the Windows Hypervisor Platform (WHP) API `WHvCancelRunVirtualProcessor` instead of POSIX signals to interrupt the vCPU:
+
+**Key differences**:
+1. **No signal loop**: Windows calls `WHvCancelRunVirtualProcessor()` at most once in `kill()`, without needing retries
+
+### Why Linux Needs a Retry Loop but Windows Doesn't
+
+The fundamental difference between the platforms lies in how cancellation interacts with the hypervisor:
+
+**Linux (KVM/mshv3)**: POSIX signals can only interrupt the vCPU when the thread is executing kernel code (specifically, during the `ioctl` syscall that runs the vCPU). There is a narrow timing window between when the signal is sent and when the vCPU enters guest mode. If a signal arrives before entering guest mode, it will be delivered but won't interrupt the guest execution. This requires repeatedly sending signals with delays until either:
+- The vCPU exits (and consequently RUNNING_BIT becomes false), or
+- The cancellation is cleared (CANCEL_BIT becomes false)
+
+**Windows (WHP)**: The `WHvCancelRunVirtualProcessor()` API sets an internal `CancelPending` flag in the Windows Hypervisor Platform. This flag is:
+- Set immediately by the API call
+- Checked at the start of each VM run loop iteration (before entering guest mode)
+- Automatically cleared when it causes a `WHvRunVpExitReasonCanceled` exit
+
+This means if `WHvCancelRunVirtualProcessor()` is called:
+- **While the vCPU is running**: The API signals the hypervisor to exit with `WHvRunVpExitReasonCanceled`
+- **Before VM runs**: The `CancelPending` flag persists and causes an immediate cancellation on the next VM run attempt
+
+Therefore, we only call `WHvCancelRunVirtualProcessor()` after checking that `RUNNING_BIT` is set. This is important because:
+1. If called when not running, the API would still succeed and will unconditionally cancel the next run attempt. This is bad since `kill()` should have no effect if the vCPU is not running
+2. This makes the InterruptHandle's `CANCEL_BIT` (which is cleared at the start of each guest function call) the source of truth for whether cancellation is intended for the current call
+
diff --git a/src/hyperlight_host/src/hypervisor/hyperv_linux.rs b/src/hyperlight_host/src/hypervisor/hyperv_linux.rs
index 64074b5f9..0c7dbe5fe 100644
--- a/src/hyperlight_host/src/hypervisor/hyperv_linux.rs
+++ b/src/hyperlight_host/src/hypervisor/hyperv_linux.rs
@@ -18,7 +18,7 @@ extern crate mshv_bindings;
extern crate mshv_ioctls;
use std::fmt::{Debug, Formatter};
-use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
+use std::sync::atomic::{AtomicBool, AtomicU64};
use std::sync::{Arc, Mutex};
use log::{LevelFilter, error};
@@ -51,8 +51,8 @@ use super::gdb::{
use super::{HyperlightExit, Hypervisor, LinuxInterruptHandle, VirtualCPU};
#[cfg(gdb)]
use crate::HyperlightError;
-use crate::hypervisor::get_memory_access_violation;
use crate::hypervisor::regs::CommonFpu;
+use crate::hypervisor::{InterruptHandle, InterruptHandleImpl, get_memory_access_violation};
use crate::mem::memory_region::{MemoryRegion, MemoryRegionFlags};
use crate::mem::mgr::SandboxMemoryManager;
use crate::mem::ptr::{GuestPtr, RawPtr};
@@ -273,7 +273,7 @@ pub(crate) struct HypervLinuxDriver {
vcpu_fd: VcpuFd,
orig_rsp: GuestPtr,
entrypoint: u64,
- interrupt_handle: Arc,
+ interrupt_handle: Arc,
mem_mgr: Option>,
host_funcs: Option>>,
@@ -374,10 +374,8 @@ impl HypervLinuxDriver {
vm_fd.map_user_memory(mshv_region)
})?;
- let interrupt_handle = Arc::new(LinuxInterruptHandle {
- running: AtomicU64::new(0),
- cancel_requested: AtomicU64::new(0),
- call_active: AtomicBool::new(false),
+ let interrupt_handle: Arc = Arc::new(LinuxInterruptHandle {
+ state: AtomicU64::new(0),
#[cfg(gdb)]
debug_interrupt: AtomicBool::new(false),
#[cfg(all(
@@ -500,8 +498,11 @@ impl Hypervisor for HypervLinuxDriver {
};
self.vcpu_fd.set_regs(®s)?;
+ let interrupt_handle = self.interrupt_handle.clone();
+
VirtualCPU::run(
self.as_mut_hypervisor(),
+ interrupt_handle,
#[cfg(gdb)]
dbg_mem_access_fn,
)
@@ -560,14 +561,15 @@ impl Hypervisor for HypervLinuxDriver {
// reset fpu state
self.set_fpu(&CommonFpu::default())?;
+ let interrupt_handle = self.interrupt_handle.clone();
+
// run
VirtualCPU::run(
self.as_mut_hypervisor(),
+ interrupt_handle,
#[cfg(gdb)]
dbg_mem_access_fn,
- )?;
-
- Ok(())
+ )
}
#[instrument(err(Debug), skip_all, parent = Span::current(), level = "Trace")]
@@ -643,76 +645,11 @@ impl Hypervisor for HypervLinuxDriver {
#[cfg(gdb)]
const EXCEPTION_INTERCEPT: hv_message_type = hv_message_type_HVMSG_X64_EXCEPTION_INTERCEPT;
- self.interrupt_handle
- .tid
- .store(unsafe { libc::pthread_self() as u64 }, Ordering::Release);
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // Cast to internal trait for access to internal methods
- let interrupt_handle_internal =
- self.interrupt_handle.as_ref() as &dyn super::InterruptHandleInternal;
-
- // (after set_running_bit but before checking cancel_requested):
- // - kill() will stamp cancel_requested with the current generation
- // - We will check cancel_requested below and skip the VcpuFd::run() call
- // - This is the desired behavior - the kill takes effect immediately
- let generation = interrupt_handle_internal.set_running_bit();
-
- #[cfg(not(gdb))]
- let debug_interrupt = false;
- #[cfg(gdb)]
- let debug_interrupt = self
- .interrupt_handle
- .debug_interrupt
- .load(Ordering::Relaxed);
-
- // Don't run the vcpu if `cancel_requested` is set for our generation
- //
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // (after checking cancel_requested but before vcpu.run()):
- // - kill() will stamp cancel_requested with the current generation
- // - We will proceed with vcpu.run(), but signals will be sent to interrupt it
- // - The vcpu will be interrupted and return EINTR (handled below)
- let exit_reason = if interrupt_handle_internal
- .is_cancel_requested_for_generation(generation)
- || debug_interrupt
- {
- Err(mshv_ioctls::MshvError::from(libc::EINTR))
- } else {
- #[cfg(feature = "trace_guest")]
- tc.setup_guest_trace(Span::current().context());
-
- // Note: if a `InterruptHandle::kill()` called while this thread is **here**
- // Then the vcpu will run, but we will keep sending signals to this thread
- // to interrupt it until `running` is set to false. The `vcpu_fd::run()` call will
- // return either normally with an exit reason, or from being "kicked" by out signal handler, with an EINTR error,
- // both of which are fine.
- self.vcpu_fd.run()
- };
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // (after vcpu.run() returns but before clear_running_bit):
- // - kill() continues sending signals to this thread (running bit is still set)
- // - The signals are harmless (no-op handler), we just need to check cancel_requested
- // - We load cancel_requested below to determine if this run was cancelled
- let cancel_requested =
- interrupt_handle_internal.is_cancel_requested_for_generation(generation);
- #[cfg(gdb)]
- let debug_interrupt = self
- .interrupt_handle
- .debug_interrupt
- .load(Ordering::Relaxed);
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // (after loading cancel_requested but before clear_running_bit):
- // - kill() stamps cancel_requested with the CURRENT generation (not the one we just loaded)
- // - kill() continues sending signals until running bit is cleared
- // - The newly stamped cancel_requested will affect the NEXT vcpu.run() call
- // - Signals sent now are harmless (no-op handler)
- interrupt_handle_internal.clear_running_bit();
- // At this point, running bit is clear so kill() will stop sending signals.
- // However, we may still receive delayed signals that were sent before clear_running_bit.
- // These stale signals are harmless because:
- // - The signal handler is a no-op
- // - We check generation matching in cancel_requested before treating EINTR as cancellation
- // - If generation doesn't match, we return Retry instead of Cancelled
+ #[cfg(feature = "trace_guest")]
+ tc.setup_guest_trace(Span::current().context());
+
+ let exit_reason = self.vcpu_fd.run();
+
let result = match exit_reason {
Ok(m) => match m.header.message_type {
HALT_MESSAGE => {
@@ -793,35 +730,7 @@ impl Hypervisor for HypervLinuxDriver {
},
Err(e) => match e.errno() {
// We send a signal (SIGRTMIN+offset) to interrupt the vcpu, which causes EINTR
- libc::EINTR => {
- // Check if cancellation was requested for THIS specific generation.
- // If not, the EINTR came from:
- // - A debug interrupt (if GDB is enabled)
- // - A stale signal from a previous guest call (generation mismatch)
- // - A signal meant for a different sandbox on the same thread
- // In these cases, we return Retry to continue execution.
- if cancel_requested {
- interrupt_handle_internal.clear_cancel_requested();
- HyperlightExit::Cancelled()
- } else {
- #[cfg(gdb)]
- if debug_interrupt {
- self.interrupt_handle
- .debug_interrupt
- .store(false, Ordering::Relaxed);
-
- // If the vCPU was stopped because of an interrupt, we need to
- // return a special exit reason so that the gdb thread can handle it
- // and resume execution
- HyperlightExit::Debug(VcpuStopReason::Interrupt)
- } else {
- HyperlightExit::Retry()
- }
-
- #[cfg(not(gdb))]
- HyperlightExit::Retry()
- }
- }
+ libc::EINTR => HyperlightExit::Cancelled(),
libc::EAGAIN => HyperlightExit::Retry(),
_ => {
crate::debug!("mshv Error - Details: Error: {} \n {:#?}", e, &self);
@@ -870,7 +779,7 @@ impl Hypervisor for HypervLinuxDriver {
self as &mut dyn Hypervisor
}
- fn interrupt_handle(&self) -> Arc {
+ fn interrupt_handle(&self) -> Arc {
self.interrupt_handle.clone()
}
@@ -1102,7 +1011,7 @@ impl Hypervisor for HypervLinuxDriver {
impl Drop for HypervLinuxDriver {
#[instrument(skip_all, parent = Span::current(), level = "Trace")]
fn drop(&mut self) {
- self.interrupt_handle.dropped.store(true, Ordering::Relaxed);
+ self.interrupt_handle.set_dropped();
for region in self.sandbox_regions.iter().chain(self.mmap_regions.iter()) {
let mshv_region: mshv_user_mem_region = region.to_owned().into();
match self.vm_fd.unmap_user_memory(mshv_region) {
diff --git a/src/hyperlight_host/src/hypervisor/hyperv_windows.rs b/src/hyperlight_host/src/hypervisor/hyperv_windows.rs
index ca1f75997..6c484b5d3 100644
--- a/src/hyperlight_host/src/hypervisor/hyperv_windows.rs
+++ b/src/hyperlight_host/src/hypervisor/hyperv_windows.rs
@@ -17,17 +17,14 @@ limitations under the License.
use std::fmt;
use std::fmt::{Debug, Formatter};
use std::string::String;
-use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
+use std::sync::atomic::{AtomicBool, AtomicU64};
use std::sync::{Arc, Mutex};
use log::LevelFilter;
use tracing::{Span, instrument};
#[cfg(feature = "trace_guest")]
use tracing_opentelemetry::OpenTelemetrySpanExt;
-use windows::Win32::System::Hypervisor::{
- WHV_MEMORY_ACCESS_TYPE, WHV_PARTITION_HANDLE, WHV_RUN_VP_EXIT_CONTEXT, WHV_RUN_VP_EXIT_REASON,
- WHvCancelRunVirtualProcessor,
-};
+use windows::Win32::System::Hypervisor::{WHV_MEMORY_ACCESS_TYPE, WHV_RUN_VP_EXIT_REASON};
#[cfg(crashdump)]
use {super::crashdump, std::path::Path};
#[cfg(gdb)]
@@ -46,7 +43,7 @@ use super::windows_hypervisor_platform::{VMPartition, VMProcessor};
use super::wrappers::HandleWrapper;
use super::{HyperlightExit, Hypervisor, InterruptHandle, VirtualCPU};
use crate::hypervisor::regs::{CommonFpu, CommonRegisters};
-use crate::hypervisor::{InterruptHandleInternal, get_memory_access_violation};
+use crate::hypervisor::{InterruptHandleImpl, WindowsInterruptHandle, get_memory_access_violation};
use crate::mem::memory_region::{MemoryRegion, MemoryRegionFlags};
use crate::mem::mgr::SandboxMemoryManager;
use crate::mem::ptr::{GuestPtr, RawPtr};
@@ -263,7 +260,7 @@ pub(crate) struct HypervWindowsDriver {
_surrogate_process: SurrogateProcess, // we need to keep a reference to the SurrogateProcess for the duration of the driver since otherwise it will dropped and the memory mapping will be unmapped and the surrogate process will be returned to the pool
entrypoint: u64,
orig_rsp: GuestPtr,
- interrupt_handle: Arc,
+ interrupt_handle: Arc,
mem_mgr: Option>,
host_funcs: Option>>,
@@ -327,11 +324,9 @@ impl HypervWindowsDriver {
};
let interrupt_handle = Arc::new(WindowsInterruptHandle {
- running: AtomicU64::new(0),
- cancel_requested: AtomicU64::new(0),
+ state: AtomicU64::new(0),
#[cfg(gdb)]
debug_interrupt: AtomicBool::new(false),
- call_active: AtomicBool::new(false),
partition_handle,
dropped: AtomicBool::new(false),
});
@@ -443,8 +438,10 @@ impl Hypervisor for HypervWindowsDriver {
};
self.set_regs(®s)?;
+ let interrupt_handle = self.interrupt_handle.clone();
VirtualCPU::run(
self.as_mut_hypervisor(),
+ interrupt_handle,
#[cfg(gdb)]
dbg_mem_access_hdl,
)
@@ -482,13 +479,13 @@ impl Hypervisor for HypervWindowsDriver {
// reset fpu state
self.processor.set_fpu(&CommonFpu::default())?;
+ let interrupt_handle = self.interrupt_handle.clone();
VirtualCPU::run(
self.as_mut_hypervisor(),
+ interrupt_handle,
#[cfg(gdb)]
dbg_mem_access_hdl,
- )?;
-
- Ok(())
+ )
}
#[instrument(err(Debug), skip_all, parent = Span::current(), level = "Trace")]
@@ -550,58 +547,10 @@ impl Hypervisor for HypervWindowsDriver {
&mut self,
#[cfg(feature = "trace_guest")] tc: &mut crate::sandbox::trace::TraceContext,
) -> Result {
- // Cast to internal trait for access to internal methods
- let interrupt_handle_internal =
- self.interrupt_handle.as_ref() as &dyn super::InterruptHandleInternal;
-
- // Get current generation and set running bit
- let generation = interrupt_handle_internal.set_running_bit();
-
- #[cfg(not(gdb))]
- let debug_interrupt = false;
- #[cfg(gdb)]
- let debug_interrupt = self
- .interrupt_handle
- .debug_interrupt
- .load(Ordering::Relaxed);
-
- // Check if cancellation was requested for THIS generation
- let exit_context = if interrupt_handle_internal
- .is_cancel_requested_for_generation(generation)
- || debug_interrupt
- {
- WHV_RUN_VP_EXIT_CONTEXT {
- ExitReason: WHV_RUN_VP_EXIT_REASON(8193i32), // WHvRunVpExitReasonCanceled
- VpContext: Default::default(),
- Anonymous: Default::default(),
- Reserved: Default::default(),
- }
- } else {
- #[cfg(feature = "trace_guest")]
- tc.setup_guest_trace(Span::current().context());
-
- self.processor.run()?
- };
-
- // Clear running bit
- interrupt_handle_internal.clear_running_bit();
-
- let is_canceled = exit_context.ExitReason == WHV_RUN_VP_EXIT_REASON(8193i32); // WHvRunVpExitReasonCanceled
+ #[cfg(feature = "trace_guest")]
+ tc.setup_guest_trace(Span::current().context());
- // Check if this was a manual cancellation (vs internal Windows cancellation)
- let cancel_was_requested_manually =
- interrupt_handle_internal.is_cancel_requested_for_generation(generation);
-
- // Only clear cancel_requested if we're actually processing a cancellation for this generation
- if is_canceled && cancel_was_requested_manually {
- interrupt_handle_internal.clear_cancel_requested();
- }
-
- #[cfg(gdb)]
- let debug_interrupt = self
- .interrupt_handle
- .debug_interrupt
- .load(Ordering::Relaxed);
+ let exit_context = self.processor.run()?;
let result = match exit_context.ExitReason {
// WHvRunVpExitReasonX64IoPortAccess
@@ -658,45 +607,10 @@ impl Hypervisor for HypervWindowsDriver {
}
// WHvRunVpExitReasonCanceled
// Execution was cancelled by the host.
- // This will happen when guest code runs for too long
WHV_RUN_VP_EXIT_REASON(8193i32) => {
debug!("HyperV Cancelled Details :\n {:#?}", &self);
- #[cfg(gdb)]
- if debug_interrupt {
- self.interrupt_handle
- .debug_interrupt
- .store(false, Ordering::Relaxed);
-
- // If the vCPU was stopped because of an interrupt, we need to
- // return a special exit reason so that the gdb thread can handle it
- // and resume execution
- HyperlightExit::Debug(VcpuStopReason::Interrupt)
- } else if !cancel_was_requested_manually {
- // This was an internal cancellation
- // The virtualization stack can use this function to return the control
- // of a virtual processor back to the virtualization stack in case it
- // needs to change the state of a VM or to inject an event into the processor
- // see https://learn.microsoft.com/en-us/virtualization/api/hypervisor-platform/funcs/whvcancelrunvirtualprocessor#remarks
- debug!("Internal cancellation detected, returning Retry error");
- HyperlightExit::Retry()
- } else {
- HyperlightExit::Cancelled()
- }
- #[cfg(not(gdb))]
- {
- if !cancel_was_requested_manually {
- // This was an internal cancellation
- // The virtualization stack can use this function to return the control
- // of a virtual processor back to the virtualization stack in case it
- // needs to change the state of a VM or to inject an event into the processor
- // see https://learn.microsoft.com/en-us/virtualization/api/hypervisor-platform/funcs/whvcancelrunvirtualprocessor#remarks
- debug!("Internal cancellation detected, returning Retry error");
- HyperlightExit::Retry()
- } else {
- HyperlightExit::Cancelled()
- }
- }
+ HyperlightExit::Cancelled()
}
#[cfg(gdb)]
WHV_RUN_VP_EXIT_REASON(4098i32) => {
@@ -754,7 +668,7 @@ impl Hypervisor for HypervWindowsDriver {
self.processor.set_sregs(sregs)
}
- fn interrupt_handle(&self) -> Arc {
+ fn interrupt_handle(&self) -> Arc {
self.interrupt_handle.clone()
}
@@ -990,86 +904,6 @@ impl Hypervisor for HypervWindowsDriver {
impl Drop for HypervWindowsDriver {
fn drop(&mut self) {
- self.interrupt_handle.dropped.store(true, Ordering::Relaxed);
- }
-}
-
-#[derive(Debug)]
-pub struct WindowsInterruptHandle {
- /// Combined running flag (bit 63) and generation counter (bits 0-62).
- ///
- /// The generation increments with each guest function call to prevent
- /// stale cancellations from affecting new calls (ABA problem).
- ///
- /// Layout: `[running:1 bit][generation:63 bits]`
- running: AtomicU64,
-
- /// Combined cancel_requested flag (bit 63) and generation counter (bits 0-62).
- ///
- /// When kill() is called, this stores the current generation along with
- /// the cancellation flag. The VCPU only honors the cancellation if the
- /// generation matches its current generation.
- ///
- /// Layout: `[cancel_requested:1 bit][generation:63 bits]`
- cancel_requested: AtomicU64,
-
- // This is used to signal the GDB thread to stop the vCPU
- #[cfg(gdb)]
- debug_interrupt: AtomicBool,
- /// Flag indicating whether a guest function call is currently in progress.
- ///
- /// **true**: A guest function call is active (between call start and completion)
- /// **false**: No guest function call is active
- ///
- /// # Purpose
- ///
- /// This flag prevents kill() from having any effect when called outside of a
- /// guest function call. This solves the "kill-in-advance" problem where kill()
- /// could be called before a guest function starts and would incorrectly cancel it.
- call_active: AtomicBool,
- partition_handle: WHV_PARTITION_HANDLE,
- dropped: AtomicBool,
-}
-
-impl InterruptHandle for WindowsInterruptHandle {
- fn kill(&self) -> bool {
- // Check if a call is actually active first
- if !self.call_active.load(Ordering::Acquire) {
- return false;
- }
-
- // Get the current running state and generation
- let (running, generation) = self.get_running_and_generation();
-
- // Set cancel_requested with the current generation
- self.set_cancel_requested(generation);
-
- // Only call WHvCancelRunVirtualProcessor if VCPU is actually running in guest mode
- running && unsafe { WHvCancelRunVirtualProcessor(self.partition_handle, 0, 0).is_ok() }
- }
-
- #[cfg(gdb)]
- fn kill_from_debugger(&self) -> bool {
- self.debug_interrupt.store(true, Ordering::Relaxed);
- let (running, _) = self.get_running_and_generation();
- running && unsafe { WHvCancelRunVirtualProcessor(self.partition_handle, 0, 0).is_ok() }
- }
-
- fn dropped(&self) -> bool {
- self.dropped.load(Ordering::Relaxed)
- }
-}
-
-impl InterruptHandleInternal for WindowsInterruptHandle {
- fn get_call_active(&self) -> &AtomicBool {
- &self.call_active
- }
-
- fn get_running(&self) -> &AtomicU64 {
- &self.running
- }
-
- fn get_cancel_requested(&self) -> &AtomicU64 {
- &self.cancel_requested
+ self.interrupt_handle.set_dropped();
}
}
diff --git a/src/hyperlight_host/src/hypervisor/kvm.rs b/src/hyperlight_host/src/hypervisor/kvm.rs
index 04b8ed60f..94b2187e3 100644
--- a/src/hyperlight_host/src/hypervisor/kvm.rs
+++ b/src/hyperlight_host/src/hypervisor/kvm.rs
@@ -15,7 +15,7 @@ limitations under the License.
*/
use std::fmt::Debug;
-use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
+use std::sync::atomic::{AtomicBool, AtomicU64};
use std::sync::{Arc, Mutex};
use kvm_bindings::{kvm_fpu, kvm_regs, kvm_sregs, kvm_userspace_memory_region};
@@ -36,8 +36,8 @@ use super::gdb::{
use super::{HyperlightExit, Hypervisor, LinuxInterruptHandle, VirtualCPU};
#[cfg(gdb)]
use crate::HyperlightError;
-use crate::hypervisor::get_memory_access_violation;
use crate::hypervisor::regs::{CommonFpu, CommonRegisters};
+use crate::hypervisor::{InterruptHandle, InterruptHandleImpl, get_memory_access_violation};
use crate::mem::memory_region::{MemoryRegion, MemoryRegionFlags};
use crate::mem::mgr::SandboxMemoryManager;
use crate::mem::ptr::{GuestPtr, RawPtr};
@@ -271,7 +271,7 @@ pub(crate) struct KVMDriver {
vcpu_fd: VcpuFd,
entrypoint: u64,
orig_rsp: GuestPtr,
- interrupt_handle: Arc,
+ interrupt_handle: Arc,
mem_mgr: Option>,
host_funcs: Option>>,
@@ -332,10 +332,8 @@ impl KVMDriver {
let rsp_gp = GuestPtr::try_from(RawPtr::from(rsp))?;
- let interrupt_handle = Arc::new(LinuxInterruptHandle {
- running: AtomicU64::new(0),
- cancel_requested: AtomicU64::new(0),
- call_active: AtomicBool::new(false),
+ let interrupt_handle: Arc = Arc::new(LinuxInterruptHandle {
+ state: AtomicU64::new(0),
#[cfg(gdb)]
debug_interrupt: AtomicBool::new(false),
#[cfg(all(
@@ -353,8 +351,8 @@ impl KVMDriver {
)))]
tid: AtomicU64::new(unsafe { libc::pthread_self() }),
retry_delay: config.get_interrupt_retry_delay(),
- dropped: AtomicBool::new(false),
sig_rt_min_offset: config.get_interrupt_vcpu_sigrtmin_offset(),
+ dropped: AtomicBool::new(false),
});
let mut kvm = Self {
@@ -460,8 +458,11 @@ impl Hypervisor for KVMDriver {
};
self.set_regs(®s)?;
+ let interrupt_handle = self.interrupt_handle.clone();
+
VirtualCPU::run(
self.as_mut_hypervisor(),
+ interrupt_handle,
#[cfg(gdb)]
dbg_mem_access_fn,
)
@@ -543,9 +544,12 @@ impl Hypervisor for KVMDriver {
// reset fpu state
self.set_fpu(&CommonFpu::default())?;
+ let interrupt_handle = self.interrupt_handle.clone();
+
// run
VirtualCPU::run(
self.as_mut_hypervisor(),
+ interrupt_handle,
#[cfg(gdb)]
dbg_mem_access_fn,
)?;
@@ -619,76 +623,10 @@ impl Hypervisor for KVMDriver {
&mut self,
#[cfg(feature = "trace_guest")] tc: &mut crate::sandbox::trace::TraceContext,
) -> Result {
- self.interrupt_handle
- .tid
- .store(unsafe { libc::pthread_self() as u64 }, Ordering::Release);
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // Cast to internal trait for access to internal methods
- let interrupt_handle_internal =
- self.interrupt_handle.as_ref() as &dyn super::InterruptHandleInternal;
-
- // (after set_running_bit but before checking cancel_requested):
- // - kill() will stamp cancel_requested with the current generation
- // - We will check cancel_requested below and skip the VcpuFd::run() call
- // - This is the desired behavior - the kill takes effect immediately
- let generation = interrupt_handle_internal.set_running_bit();
-
- #[cfg(not(gdb))]
- let debug_interrupt = false;
- #[cfg(gdb)]
- let debug_interrupt = self
- .interrupt_handle
- .debug_interrupt
- .load(Ordering::Relaxed);
- // Don't run the vcpu if `cancel_requested` is set for our generation
- //
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // (after checking cancel_requested but before vcpu.run()):
- // - kill() will stamp cancel_requested with the current generation
- // - We will proceed with vcpu.run(), but signals will be sent to interrupt it
- // - The vcpu will be interrupted and return EINTR (handled below)
- let exit_reason = if interrupt_handle_internal
- .is_cancel_requested_for_generation(generation)
- || debug_interrupt
- {
- Err(kvm_ioctls::Error::new(libc::EINTR))
- } else {
- #[cfg(feature = "trace_guest")]
- tc.setup_guest_trace(Span::current().context());
-
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // (during vcpu.run() execution):
- // - kill() stamps cancel_requested with the current generation
- // - kill() sends signals (SIGRTMIN+offset) to this thread repeatedly
- // - The signal handler is a no-op, but it causes vcpu.run() to return EINTR
- // - We check cancel_requested below and return Cancelled if generation matches
- self.vcpu_fd.run()
- };
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // (after vcpu.run() returns but before clear_running_bit):
- // - kill() continues sending signals to this thread (running bit is still set)
- // - The signals are harmless (no-op handler), we just need to check cancel_requested
- // - We load cancel_requested below to determine if this run was cancelled
- let cancel_requested =
- interrupt_handle_internal.is_cancel_requested_for_generation(generation);
- #[cfg(gdb)]
- let debug_interrupt = self
- .interrupt_handle
- .debug_interrupt
- .load(Ordering::Relaxed);
- // Note: if `InterruptHandle::kill()` is called while this thread is **here**
- // (after loading cancel_requested but before clear_running_bit):
- // - kill() stamps cancel_requested with the CURRENT generation (not the one we just loaded)
- // - kill() continues sending signals until running bit is cleared
- // - The newly stamped cancel_requested will affect the NEXT vcpu.run() call
- // - Signals sent now are harmless (no-op handler)
- interrupt_handle_internal.clear_running_bit();
- // At this point, running bit is clear so kill() will stop sending signals.
- // However, we may still receive delayed signals that were sent before clear_running_bit.
- // These stale signals are harmless because:
- // - The signal handler is a no-op
- // - We check generation matching in cancel_requested before treating EINTR as cancellation
- // - If generation doesn't match, we return Retry instead of Cancelled
+ #[cfg(feature = "trace_guest")]
+ tc.setup_guest_trace(Span::current().context());
+
+ let exit_reason = self.vcpu_fd.run();
let result = match exit_reason {
Ok(VcpuExit::Hlt) => {
crate::debug!("KVM - Halt Details : {:#?}", &self);
@@ -738,35 +676,7 @@ impl Hypervisor for KVMDriver {
},
Err(e) => match e.errno() {
// We send a signal (SIGRTMIN+offset) to interrupt the vcpu, which causes EINTR
- libc::EINTR => {
- // Check if cancellation was requested for THIS specific generation.
- // If not, the EINTR came from:
- // - A debug interrupt (if GDB is enabled)
- // - A stale signal from a previous guest call (generation mismatch)
- // - A signal meant for a different sandbox on the same thread
- // In these cases, we return Retry to continue execution.
- if cancel_requested {
- interrupt_handle_internal.clear_cancel_requested();
- HyperlightExit::Cancelled()
- } else {
- #[cfg(gdb)]
- if debug_interrupt {
- self.interrupt_handle
- .debug_interrupt
- .store(false, Ordering::Relaxed);
-
- // If the vCPU was stopped because of an interrupt, we need to
- // return a special exit reason so that the gdb thread can handle it
- // and resume execution
- HyperlightExit::Debug(VcpuStopReason::Interrupt)
- } else {
- HyperlightExit::Retry()
- }
-
- #[cfg(not(gdb))]
- HyperlightExit::Retry()
- }
- }
+ libc::EINTR => HyperlightExit::Cancelled(),
libc::EAGAIN => HyperlightExit::Retry(),
_ => {
crate::debug!("KVM Error -Details: Address: {} \n {:#?}", e, &self);
@@ -820,7 +730,7 @@ impl Hypervisor for KVMDriver {
self as &mut dyn Hypervisor
}
- fn interrupt_handle(&self) -> Arc {
+ fn interrupt_handle(&self) -> Arc {
self.interrupt_handle.clone()
}
@@ -1057,6 +967,6 @@ impl Hypervisor for KVMDriver {
impl Drop for KVMDriver {
fn drop(&mut self) {
- self.interrupt_handle.dropped.store(true, Ordering::Relaxed);
+ self.interrupt_handle.set_dropped();
}
}
diff --git a/src/hyperlight_host/src/hypervisor/mod.rs b/src/hyperlight_host/src/hypervisor/mod.rs
index f5575b989..74cec5c6a 100644
--- a/src/hyperlight_host/src/hypervisor/mod.rs
+++ b/src/hyperlight_host/src/hypervisor/mod.rs
@@ -178,7 +178,7 @@ pub(crate) trait Hypervisor: Debug + Send {
) -> Result;
/// Get InterruptHandle to underlying VM (returns internal trait)
- fn interrupt_handle(&self) -> Arc;
+ fn interrupt_handle(&self) -> Arc;
/// Get regs
#[allow(dead_code)]
@@ -356,33 +356,78 @@ impl VirtualCPU {
#[instrument(err(Debug), skip_all, parent = Span::current(), level = "Trace")]
pub(crate) fn run(
hv: &mut dyn Hypervisor,
+ interrupt_handle: Arc,
#[cfg(gdb)] dbg_mem_access_fn: Arc>>,
) -> Result<()> {
+ // ===== KILL() TIMING POINT 1: Between guest function calls =====
+ // Clear any stale cancellation from a previous guest function call or if kill() was called too early.
+ // This ensures that kill() called BETWEEN different guest function calls doesn't affect the next call.
+ //
+ // If kill() was called and ran to completion BEFORE this line executes:
+ // - kill() has NO effect on this guest function call because CANCEL_BIT is cleared here.
+ // - NOTE: stale signals can still be delivered, but they will be ignored.
+ interrupt_handle.clear_cancel();
+
// Keeps the trace context and open spans
#[cfg(feature = "trace_guest")]
let mut tc = crate::sandbox::trace::TraceContext::new();
loop {
- #[cfg(feature = "trace_guest")]
- let result = {
- let result = hv.run(&mut tc);
- // End current host trace by closing the current span that captures traces
- // happening when a guest exits and re-enters.
- tc.end_host_trace();
-
- // Handle the guest trace data if any
- if let Err(e) = hv.handle_trace(&mut tc) {
- // If no trace data is available, we just log a message and continue
- // Is this the right thing to do?
- log::debug!("Error handling guest trace: {:?}", e);
- }
+ // ===== KILL() TIMING POINT 2: Before set_tid() =====
+ // If kill() is called and ran to completion BEFORE this line executes:
+ // - CANCEL_BIT will be set and we will return an early VmExit::Cancelled()
+ interrupt_handle.set_tid();
+ interrupt_handle.set_running();
+
+ let exit_reason = {
+ if interrupt_handle.is_cancelled() || interrupt_handle.is_debug_interrupted() {
+ Ok(HyperlightExit::Cancelled())
+ } else {
+ #[cfg(feature = "trace_guest")]
+ let result = hv.run(&mut tc);
+ #[cfg(not(feature = "trace_guest"))]
+ let result = hv.run();
+
+ // End current host trace by closing the current span that captures traces
+ // happening when a guest exits and re-enters.
+ #[cfg(feature = "trace_guest")]
+ tc.end_host_trace();
+
+ // Handle the guest trace data if any
+ #[cfg(feature = "trace_guest")]
+ if let Err(e) = hv.handle_trace(&mut tc) {
+ // If no trace data is available, we just log a message and continue
+ // Is this the right thing to do?
+ log::debug!("Error handling guest trace: {:?}", e);
+ }
- result
+ result
+ }
};
- #[cfg(not(feature = "trace_guest"))]
- let result = hv.run();
- match result {
+ // ===== KILL() TIMING POINT 4: Before capturing cancel_requested =====
+ // If kill() is called and ran to completion BEFORE this line executes:
+ // - CANCEL_BIT will be set
+ // - Signals may still be sent (RUNNING_BIT=true) but are harmless no-ops
+ // - kill() will have no effect on this iteration, but CANCEL_BIT will persist
+ // - If the loop continues (e.g., for a host call), the next iteration will be cancelled
+ // - Stale signals from before clear_running() may arrive and kick future iterations,
+ // but will be filtered out by the cancel_requested check below (and retried).
+ let cancel_requested = interrupt_handle.is_cancelled();
+ let debug_interrupted = interrupt_handle.is_debug_interrupted();
+
+ // ===== KILL() TIMING POINT 5: Before calling clear_running() =====
+ // Same as point 4.
+ interrupt_handle.clear_running();
+
+ // ===== KILL() TIMING POINT 6: After calling clear_running() =====
+ // If kill() is called and ran to completion BEFORE this line executes:
+ // - CANCEL_BIT will be set but won't affect this iteration, it is never read below this comment
+ // and cleared at next run() start
+ // - RUNNING_BIT=false, so no new signals will be sent
+ // - Stale signals from before clear_running() may arrive and kick future iterations,
+ // but will be filtered out by the cancel_requested check below (and retried).
+ match exit_reason {
#[cfg(gdb)]
Ok(HyperlightExit::Debug(stop_reason)) => {
if let Err(e) = hv.handle_debug(dbg_mem_access_fn.clone(), stop_reason) {
@@ -425,6 +470,24 @@ impl VirtualCPU {
));
}
Ok(HyperlightExit::Cancelled()) => {
+ // If cancellation was not requested for this specific guest function call,
+ // the vcpu was interrupted by a stale cancellation from a previous call
+ if !cancel_requested && !debug_interrupted {
+ // treat this the same as a HyperlightExit::Retry, the cancel was not meant for this call
+ continue;
+ }
+
+ // If the vcpu was interrupted by a debugger, we need to handle it
+ #[cfg(gdb)]
+ {
+ interrupt_handle.clear_debug_interrupt();
+ if let Err(e) =
+ hv.handle_debug(dbg_mem_access_fn.clone(), VcpuStopReason::Interrupt)
+ {
+ log_then_return!(e);
+ }
+ }
+
// Shutdown is returned when the host has cancelled execution
// After termination, the main thread will re-initialize the VM
metrics::counter!(METRIC_GUEST_CANCELLATION).increment(1);
@@ -461,54 +524,43 @@ impl VirtualCPU {
}
}
-/// A trait for handling interrupts to a sandbox's vcpu (public API)
-pub trait InterruptHandle: Debug + Send + Sync {
+/// A trait for platform-specific interrupt handle implementation details
+pub(crate) trait InterruptHandleImpl: InterruptHandle {
+ /// Set the thread ID for the vcpu thread (no-op on Windows)
+ fn set_tid(&self);
+
+ /// Set the running state
+ fn set_running(&self);
+
+ /// Clear the running state
+ fn clear_running(&self);
+
+ /// Mark the handle as dropped
+ fn set_dropped(&self);
+
+ /// Check if cancellation was requested
+ fn is_cancelled(&self) -> bool;
+
+ /// Clear the cancellation request flag
+ fn clear_cancel(&self);
+
+ /// Check if debug interrupt was requested (always returns false when gdb feature is disabled)
+ fn is_debug_interrupted(&self) -> bool;
+
+ // Clear the debug interrupt request flag
+ #[cfg(gdb)]
+ fn clear_debug_interrupt(&self);
+}
+
+/// A trait for handling interrupts to a sandbox's vcpu
+pub trait InterruptHandle: Send + Sync + Debug {
/// Interrupt the corresponding sandbox from running.
///
- /// This method attempts to cancel a currently executing guest function call by sending
- /// a signal to the VCPU thread. It uses generation tracking and call_active flag to
- /// ensure the interruption is safe and precise.
- ///
- /// # Behavior
- ///
- /// - **Guest function running**: If called while a guest function is executing (VCPU running
- /// or in a host function call), this stamps the current generation into cancel_requested
- /// and sends a signal to interrupt the VCPU. Returns `true`.
- ///
- /// - **No active call**: If called when no guest function call is in progress (call_active=false),
- /// this has no effect and returns `false`. This prevents "kill-in-advance" where kill()
- /// is called before a guest function starts.
- ///
- /// - **During host function**: If the guest call is currently executing a host function
- /// (VCPU not running but call_active=true), this stamps cancel_requested. When the
- /// host function returns and attempts to re-enter the guest, the cancellation will
- /// be detected and the call will abort. Returns `true`.
- ///
- /// # Generation Tracking
- ///
- /// The method stamps the current generation number along with the cancellation request.
- /// This ensures that:
- /// - Stale signals from previous calls are ignored (generation mismatch)
- /// - Only the intended guest function call is affected
- /// - Multiple rapid kill() calls on the same generation are idempotent
- ///
- /// # Blocking Behavior
- ///
- /// This function will block while attempting to deliver the signal to the VCPU thread,
- /// retrying until either:
- /// - The signal is successfully delivered (VCPU transitions from running to not running)
- /// - The VCPU stops running for another reason (e.g., call completes normally)
- ///
- /// # Returns
- ///
- /// - `true`: Cancellation request was stamped (kill will take effect)
- /// - `false`: No active call, cancellation request was not stamped (no effect)
+ /// - If this is called while the the sandbox currently executing a guest function call, it will interrupt the sandbox and return `true`.
+ /// - If this is called while the sandbox is not running (for example before or after calling a guest function), it will do nothing and return `false`.
///
/// # Note
- ///
- /// To reliably interrupt a guest call, ensure `kill()` is called while the guest
- /// function is actually executing. Calling kill() before call_guest_function() will
- /// have no effect.
+ /// This function will block for the duration of the time it takes for the vcpu thread to be interrupted.
fn kill(&self) -> bool;
/// Used by a debugger to interrupt the corresponding sandbox from running.
@@ -523,322 +575,84 @@ pub trait InterruptHandle: Debug + Send + Sync {
#[cfg(gdb)]
fn kill_from_debugger(&self) -> bool;
- /// Check if the corresponding VM has been dropped.
+ /// Returns true if the corresponding sandbox has been dropped
fn dropped(&self) -> bool;
}
-/// Internal trait for interrupt handle implementation details (private, cross-platform).
-///
-/// This trait contains all the internal atomics access methods and helper functions
-/// that are shared between Linux and Windows implementations. It extends InterruptHandle
-/// to inherit the public API.
-///
-/// This trait should NOT be used outside of hypervisor implementations.
-pub(crate) trait InterruptHandleInternal: InterruptHandle {
- /// Returns the call_active atomic bool reference for internal implementations.
- fn get_call_active(&self) -> &AtomicBool;
-
- /// Returns the running atomic u64 reference for internal implementations.
- fn get_running(&self) -> &AtomicU64;
-
- /// Returns the cancel_requested atomic u64 reference for internal implementations.
- fn get_cancel_requested(&self) -> &AtomicU64;
-
- /// Set call_active - increments generation and sets flag.
- ///
- /// Increments the generation counter and sets the call_active flag to true,
- /// indicating that a guest function call is now in progress. This allows
- /// kill() to stamp cancel_requested with the correct generation.
- ///
- /// Must be called at the start of call_guest_function_by_name_no_reset(),
- /// before any VCPU execution begins.
- ///
- /// Returns true if call_active was already set (indicating a guard already exists),
- /// false otherwise.
- fn set_call_active(&self) -> bool {
- self.increment_generation();
- self.get_call_active().swap(true, Ordering::AcqRel)
- }
-
- /// Clear call_active - clears the call_active flag.
- ///
- /// Clears the call_active flag, indicating that no guest function call is
- /// in progress. After this, kill() will have no effect and will return false.
- ///
- /// Must be called at the end of call_guest_function_by_name_no_reset(),
- /// after the guest call has fully completed (whether successfully or with error).
- fn clear_call_active(&self) {
- self.get_call_active().store(false, Ordering::Release)
- }
-
- /// Set cancel_requested to true with the given generation.
- ///
- /// This stamps the cancellation request with the current generation number,
- /// ensuring that only the VCPU running with this exact generation will honor
- /// the cancellation.
- fn set_cancel_requested(&self, generation: u64) {
- const CANCEL_REQUESTED_BIT: u64 = 1 << 63;
- const MAX_GENERATION: u64 = CANCEL_REQUESTED_BIT - 1;
- let value = CANCEL_REQUESTED_BIT | (generation & MAX_GENERATION);
- self.get_cancel_requested().store(value, Ordering::Release);
- }
-
- /// Clear cancel_requested (reset to no cancellation).
- ///
- /// This is called after a cancellation has been processed to reset the
- /// cancellation flag for the next guest call.
- fn clear_cancel_requested(&self) {
- self.get_cancel_requested().store(0, Ordering::Release);
- }
-
- /// Check if cancel_requested is set for the given generation.
- ///
- /// Returns true only if BOTH:
- /// - The cancellation flag is set
- /// - The stored generation matches the provided generation
- ///
- /// This prevents stale cancellations from affecting new guest calls.
- fn is_cancel_requested_for_generation(&self, generation: u64) -> bool {
- const CANCEL_REQUESTED_BIT: u64 = 1 << 63;
- const MAX_GENERATION: u64 = CANCEL_REQUESTED_BIT - 1;
- let raw = self.get_cancel_requested().load(Ordering::Acquire);
- let is_set = raw & CANCEL_REQUESTED_BIT != 0;
- let stored_generation = raw & MAX_GENERATION;
- is_set && stored_generation == generation
- }
-
- /// Set running bit to true, return current generation.
- ///
- /// This is called when the VCPU is about to enter guest mode. It atomically
- /// sets the running flag while preserving the generation counter.
- fn set_running_bit(&self) -> u64 {
- const RUNNING_BIT: u64 = 1 << 63;
- self.get_running()
- .fetch_update(Ordering::Release, Ordering::Acquire, |raw| {
- Some(raw | RUNNING_BIT)
- })
- .map(|raw| raw & !RUNNING_BIT) // Return the current generation
- .unwrap_or(0)
- }
-
- /// Increment the generation for a new guest function call.
- ///
- /// The generation counter wraps around at MAX_GENERATION (2^63 - 1).
- /// This is called at the start of each new guest function call to provide
- /// a unique identifier that prevents ABA problems with stale cancellations.
- ///
- /// Returns the NEW generation number (after incrementing).
- fn increment_generation(&self) -> u64 {
- const RUNNING_BIT: u64 = 1 << 63;
- const MAX_GENERATION: u64 = RUNNING_BIT - 1;
- self.get_running()
- .fetch_update(Ordering::Release, Ordering::Acquire, |raw| {
- let current_generation = raw & !RUNNING_BIT;
- let running_bit = raw & RUNNING_BIT;
- if current_generation == MAX_GENERATION {
- // Restart generation from 0
- return Some(running_bit);
- }
- Some((current_generation + 1) | running_bit)
- })
- .map(|raw| (raw & !RUNNING_BIT) + 1) // Return the NEW generation
- .unwrap_or(1) // If wrapped, return 1
- }
-
- /// Get the current running state and generation counter.
- ///
- /// Returns a tuple of (running, generation) where:
- /// - running: true if VCPU is currently in guest mode
- /// - generation: current generation counter value
- fn get_running_and_generation(&self) -> (bool, u64) {
- const RUNNING_BIT: u64 = 1 << 63;
- let raw = self.get_running().load(Ordering::Acquire);
- let running = raw & RUNNING_BIT != 0;
- let generation = raw & !RUNNING_BIT;
- (running, generation)
- }
-
- /// Clear the running bit and return the old value.
- ///
- /// This is called when the VCPU exits from guest mode back to host mode.
- /// The return value (which includes the generation and the old running bit)
- /// is currently unused by all callers.
- fn clear_running_bit(&self) -> u64 {
- const RUNNING_BIT: u64 = 1 << 63;
- self.get_running()
- .fetch_and(!RUNNING_BIT, Ordering::Release)
- }
-}
-
#[cfg(any(kvm, mshv3))]
#[derive(Debug)]
pub(super) struct LinuxInterruptHandle {
- /// Atomic flag combining running state and generation counter.
+ /// Atomic value packing vcpu execution state.
///
- /// **Bit 63**: VCPU running state (1 = running, 0 = not running)
- /// **Bits 0-62**: Generation counter (incremented once per guest function call)
+ /// Bit layout:
+ /// - Bit 1: RUNNING_BIT - set when vcpu is actively running
+ /// - Bit 0: CANCEL_BIT - set when cancellation has been requested
///
- /// # Generation Tracking
- ///
- /// The generation counter is incremented once at the start of each guest function call
- /// and remains constant throughout that call, even if the VCPU is run multiple times
- /// (due to host function calls, retries, etc.). This design solves the race condition
- /// where a kill() from a previous call could spuriously cancel a new call.
- ///
- /// ## Why Generations Are Needed
- ///
- /// Consider this scenario WITHOUT generation tracking:
- /// 1. Thread A starts guest call 1, VCPU runs
- /// 2. Thread B calls kill(), sends signal to Thread A
- /// 3. Guest call 1 completes before signal arrives
- /// 4. Thread A starts guest call 2, VCPU runs again
- /// 5. Stale signal from step 2 arrives and incorrectly cancels call 2
- ///
- /// WITH generation tracking:
- /// 1. Thread A starts guest call 1 (generation N), VCPU runs
- /// 2. Thread B calls kill(), stamps cancel_requested with generation N
- /// 3. Guest call 1 completes, signal may or may not have arrived yet
- /// 4. Thread A starts guest call 2 (generation N+1), VCPU runs again
- /// 5. If stale signal arrives, signal handler checks: cancel_requested.generation (N) != current generation (N+1)
- /// 6. Stale signal is ignored, call 2 continues normally
- ///
- /// ## Per-Call vs Per-Run Generation
- ///
- /// It's critical that generation is incremented per GUEST FUNCTION CALL, not per vcpu.run():
- /// - A single guest function call may invoke vcpu.run() multiple times (host calls, retries)
- /// - All run() calls within the same guest call must share the same generation
- /// - This ensures kill() affects the entire guest function call atomically
- ///
- /// # Invariants
- ///
- /// - If VCPU is running: bit 63 is set (neither converse nor inverse holds)
- /// - If VCPU is running: bits 0-62 match the current guest call's generation
- running: AtomicU64,
+ /// CANCEL_BIT persists across vcpu exits/re-entries within a single `HyperlightVm::run()` call
+ /// (e.g., during host function calls), but is cleared at the start of each new `HyperlightVm::run()` call.
+ state: AtomicU64,
- /// Thread ID where the VCPU is currently running.
- ///
- /// # Invariants
+ /// Thread ID where the vcpu is running.
///
- /// - If VCPU is running: tid contains the thread ID of the executing thread
- /// - Multiple VMs may share the same tid, but at most one will have running=true
+ /// Note: Multiple VMs may have the same `tid` (same thread runs multiple sandboxes sequentially),
+ /// but at most one VM will have RUNNING_BIT set at any given time.
tid: AtomicU64,
- /// Generation-aware cancellation request flag.
- ///
- /// **Bit 63**: Cancellation requested flag (1 = kill requested, 0 = no kill)
- /// **Bits 0-62**: Generation number when cancellation was requested
- ///
- /// # Purpose
- ///
- /// This flag serves three critical functions:
- ///
- /// 1. **Prevent stale signals**: A VCPU may only be interrupted if cancel_requested
- /// is set AND the generation matches the current call's generation
- ///
- /// 2. **Handle host function calls**: If kill() is called while a host function is
- /// executing (VCPU not running but call is active), cancel_requested is stamped
- /// with the current generation. When the host function returns and the VCPU
- /// attempts to re-enter the guest, it will see the cancellation and abort.
- ///
- /// 3. **Detect stale kills**: If cancel_requested.generation doesn't match the
- /// current generation, it's from a previous call and should be ignored
- ///
- /// # States and Transitions
- ///
- /// - **No cancellation**: cancel_requested = 0 (bit 63 clear)
- /// - **Cancellation for generation N**: cancel_requested = (1 << 63) | N
- /// - Signal handler checks: (cancel_requested & 0x7FFFFFFFFFFFFFFF) == current_generation
- cancel_requested: AtomicU64,
-
- /// Flag indicating whether a guest function call is currently in progress.
- ///
- /// **true**: A guest function call is active (between call start and completion)
- /// **false**: No guest function call is active
- ///
- /// # Purpose
- ///
- /// This flag prevents kill() from having any effect when called outside of a
- /// guest function call. This solves the "kill-in-advance" problem where kill()
- /// could be called before a guest function starts and would incorrectly cancel it.
- ///
- /// # Behavior
- ///
- /// - Set to true at the start of call_guest_function_by_name_no_reset()
- /// - Cleared at the end of call_guest_function_by_name_no_reset()
- /// - kill() only stamps cancel_requested if call_active is true
- /// - If kill() is called when call_active=false, it returns false and has no effect
- ///
- /// # Why AtomicBool is Safe
- ///
- /// Although there's a theoretical race where:
- /// 1. Thread A checks call_active (false)
- /// 2. Thread B sets call_active (true) and starts guest call
- /// 3. Thread A's kill() returns false (no effect)
- ///
- /// This is acceptable because the generation tracking provides an additional
- /// safety layer. Even if a stale kill somehow stamped cancel_requested, the
- /// generation mismatch would cause it to be ignored.
- call_active: AtomicBool,
-
- /// Debugger interrupt request flag (GDB only).
- ///
- /// Set when kill_from_debugger() is called, cleared when VCPU stops running.
- /// Used to distinguish debugger interrupts from normal kill() interrupts.
+ /// Debugger interrupt flag (gdb feature only).
+ /// Set when `kill_from_debugger()` is called, cleared when vcpu stops running.
#[cfg(gdb)]
debug_interrupt: AtomicBool,
/// Whether the corresponding VM has been dropped.
dropped: AtomicBool,
- /// Delay between retry attempts when sending signals to the VCPU thread.
+ /// Delay between retry attempts when sending signals to interrupt the vcpu.
retry_delay: Duration,
- /// Offset from SIGRTMIN for the signal used to interrupt the VCPU thread.
+ /// Offset from SIGRTMIN for the signal used to interrupt the vcpu thread.
sig_rt_min_offset: u8,
}
#[cfg(any(kvm, mshv3))]
impl LinuxInterruptHandle {
- fn send_signal(&self, stamp_generation: bool) -> bool {
+ const RUNNING_BIT: u64 = 1 << 1;
+ const CANCEL_BIT: u64 = 1 << 0;
+
+ /// Get the running and cancel flags atomically.
+ ///
+ /// # Memory Ordering
+ /// Uses `Acquire` ordering to synchronize with the `Release` in `set_running()` and `kill()`.
+ /// This ensures that when we observe running=true, we also see the correct `tid` value.
+ fn get_running_and_cancel(&self) -> (bool, bool) {
+ let state = self.state.load(Ordering::Acquire);
+ let running = state & Self::RUNNING_BIT != 0;
+ let cancel = state & Self::CANCEL_BIT != 0;
+ (running, cancel)
+ }
+
+ fn send_signal(&self) -> bool {
let signal_number = libc::SIGRTMIN() + self.sig_rt_min_offset as libc::c_int;
let mut sent_signal = false;
- let mut target_generation: Option = None;
loop {
- if !self.call_active.load(Ordering::Acquire) {
- // No active call, so no need to send signal
- break;
- }
+ let (running, cancel) = self.get_running_and_cancel();
- let (running, generation) = self.get_running_and_generation();
-
- // Stamp generation into cancel_requested if requested and this is the first iteration
- // We stamp even when running=false to support killing during host function calls
- // The generation tracking will prevent stale kills from affecting new calls
- // Only stamp if a call is actually active (call_active=true)
- if stamp_generation
- && target_generation.is_none()
- && self.call_active.load(Ordering::Acquire)
- {
- self.set_cancel_requested(generation);
- target_generation = Some(generation);
- }
+ // Check if we should continue sending signals
+ // Exit if not running OR if neither cancel nor debug_interrupt is set
+ #[cfg(gdb)]
+ let should_continue =
+ running && (cancel || self.debug_interrupt.load(Ordering::Relaxed));
+ #[cfg(not(gdb))]
+ let should_continue = running && cancel;
- // If not running, we've stamped the generation (if requested), so we're done
- // This handles the host function call scenario
- if !running {
+ if !should_continue {
break;
}
- match target_generation {
- None => target_generation = Some(generation),
- // prevent ABA problem
- Some(expected) if expected != generation => break,
- _ => {}
- }
-
log::info!("Sending signal to kill vcpu thread...");
sent_signal = true;
+ // Acquire ordering to synchronize with the Release store in set_tid()
+ // This ensures we see the correct tid value for the currently running vcpu
unsafe {
libc::pthread_kill(self.tid.load(Ordering::Acquire) as _, signal_number);
}
@@ -849,42 +663,207 @@ impl LinuxInterruptHandle {
}
}
+#[cfg(any(kvm, mshv3))]
+impl InterruptHandleImpl for LinuxInterruptHandle {
+ fn set_tid(&self) {
+ // Release ordering to synchronize with the Acquire load of `running` in send_signal()
+ // This ensures that when send_signal() observes RUNNING_BIT=true (via Acquire),
+ // it also sees the correct tid value stored here
+ self.tid
+ .store(unsafe { libc::pthread_self() as u64 }, Ordering::Release);
+ }
+
+ fn set_running(&self) {
+ // Release ordering to ensure that the tid store (which uses Release)
+ // is visible to any thread that observes running=true via Acquire ordering.
+ // This prevents the interrupt thread from reading a stale tid value.
+ self.state.fetch_or(Self::RUNNING_BIT, Ordering::Release);
+ }
+
+ fn is_cancelled(&self) -> bool {
+ // Acquire ordering to synchronize with the Release in kill()
+ // This ensures we see the cancel flag set by the interrupt thread
+ self.state.load(Ordering::Acquire) & Self::CANCEL_BIT != 0
+ }
+
+ fn clear_cancel(&self) {
+ // Release ordering to ensure that any operations from the previous run()
+ // are visible to other threads. While this is typically called by the vcpu thread
+ // at the start of run(), the VM itself can move between threads across guest calls.
+ self.state.fetch_and(!Self::CANCEL_BIT, Ordering::Release);
+ }
+
+ fn clear_running(&self) {
+ // Release ordering to ensure all vcpu operations are visible before clearing running
+ self.state.fetch_and(!Self::RUNNING_BIT, Ordering::Release);
+ }
+
+ fn is_debug_interrupted(&self) -> bool {
+ #[cfg(gdb)]
+ {
+ self.debug_interrupt.load(Ordering::Relaxed)
+ }
+ #[cfg(not(gdb))]
+ {
+ false
+ }
+ }
+
+ #[cfg(gdb)]
+ fn clear_debug_interrupt(&self) {
+ self.debug_interrupt.store(false, Ordering::Relaxed);
+ }
+
+ fn set_dropped(&self) {
+ // Release ordering to ensure all VM cleanup operations are visible
+ // to any thread that checks dropped() via Acquire
+ self.dropped.store(true, Ordering::Release);
+ }
+}
+
#[cfg(any(kvm, mshv3))]
impl InterruptHandle for LinuxInterruptHandle {
fn kill(&self) -> bool {
- if !(self.call_active.load(Ordering::Acquire)) {
- // No active call, so no effect
- return false;
- }
+ // Release ordering ensures that any writes before kill() are visible to the vcpu thread
+ // when it checks is_cancelled() with Acquire ordering
+ self.state.fetch_or(Self::CANCEL_BIT, Ordering::Release);
- // send_signal will stamp the generation into cancel_requested
- // right before sending each signal, ensuring they're always in sync
- self.send_signal(true)
+ // Send signals to interrupt the vcpu if it's currently running
+ self.send_signal()
}
#[cfg(gdb)]
fn kill_from_debugger(&self) -> bool {
self.debug_interrupt.store(true, Ordering::Relaxed);
- self.send_signal(false)
+ self.send_signal()
}
-
fn dropped(&self) -> bool {
- self.dropped.load(Ordering::Relaxed)
+ // Acquire ordering to synchronize with the Release in set_dropped()
+ // This ensures we see all VM cleanup operations that happened before drop
+ self.dropped.load(Ordering::Acquire)
}
}
-#[cfg(any(kvm, mshv3))]
-impl InterruptHandleInternal for LinuxInterruptHandle {
- fn get_call_active(&self) -> &AtomicBool {
- &self.call_active
+#[cfg(target_os = "windows")]
+#[derive(Debug)]
+pub(super) struct WindowsInterruptHandle {
+ /// Atomic value packing vcpu execution state.
+ ///
+ /// Bit layout:
+ /// - Bit 1: RUNNING_BIT - set when vcpu is actively running
+ /// - Bit 0: CANCEL_BIT - set when cancellation has been requested
+ ///
+ /// `WHvCancelRunVirtualProcessor()` will return Ok even if the vcpu is not running,
+ /// which is why we need the RUNNING_BIT.
+ ///
+ /// CANCEL_BIT persists across vcpu exits/re-entries within a single `HyperlightVm::run()` call
+ /// (e.g., during host function calls), but is cleared at the start of each new `HyperlightVm::run()` call.
+ state: AtomicU64,
+
+ // This is used to signal the GDB thread to stop the vCPU
+ #[cfg(gdb)]
+ debug_interrupt: AtomicBool,
+ partition_handle: windows::Win32::System::Hypervisor::WHV_PARTITION_HANDLE,
+ dropped: AtomicBool,
+}
+
+#[cfg(target_os = "windows")]
+impl WindowsInterruptHandle {
+ const RUNNING_BIT: u64 = 1 << 1;
+ const CANCEL_BIT: u64 = 1 << 0;
+}
+
+#[cfg(target_os = "windows")]
+impl InterruptHandleImpl for WindowsInterruptHandle {
+ fn set_tid(&self) {
+ // No-op on Windows - we don't need to track thread ID
+ }
+
+ fn set_running(&self) {
+ // Release ordering to ensure prior memory operations are visible when another thread observes running=true
+ self.state.fetch_or(Self::RUNNING_BIT, Ordering::Release);
+ }
+
+ fn is_cancelled(&self) -> bool {
+ // Acquire ordering to synchronize with the Release in kill()
+ // This ensures we see the CANCEL_BIT set by the interrupt thread
+ self.state.load(Ordering::Acquire) & Self::CANCEL_BIT != 0
+ }
+
+ fn clear_cancel(&self) {
+ // Release ordering to ensure that any operations from the previous run()
+ // are visible to other threads. While this is typically called by the vcpu thread
+ // at the start of run(), the VM itself can move between threads across guest calls.
+ self.state.fetch_and(!Self::CANCEL_BIT, Ordering::Release);
+ }
+
+ fn clear_running(&self) {
+ // Release ordering to ensure all vcpu operations are visible before clearing running
+ self.state.fetch_and(!Self::RUNNING_BIT, Ordering::Release);
+ #[cfg(gdb)]
+ self.debug_interrupt.store(false, Ordering::Relaxed);
+ }
+
+ fn is_debug_interrupted(&self) -> bool {
+ #[cfg(gdb)]
+ {
+ self.debug_interrupt.load(Ordering::Relaxed)
+ }
+ #[cfg(not(gdb))]
+ {
+ false
+ }
+ }
+
+ #[cfg(gdb)]
+ fn clear_debug_interrupt(&self) {
+ #[cfg(gdb)]
+ self.debug_interrupt.store(false, Ordering::Relaxed);
+ }
+
+ fn set_dropped(&self) {
+ // Release ordering to ensure all VM cleanup operations are visible
+ // to any thread that checks dropped() via Acquire
+ self.dropped.store(true, Ordering::Release);
+ }
+}
+
+#[cfg(target_os = "windows")]
+impl InterruptHandle for WindowsInterruptHandle {
+ fn kill(&self) -> bool {
+ use windows::Win32::System::Hypervisor::WHvCancelRunVirtualProcessor;
+
+ // Release ordering ensures that any writes before kill() are visible to the vcpu thread
+ // when it checks is_cancelled() with Acquire ordering
+ self.state.fetch_or(Self::CANCEL_BIT, Ordering::Release);
+
+ // Acquire ordering to synchronize with the Release in set_running()
+ // This ensures we see the running state set by the vcpu thread
+ let state = self.state.load(Ordering::Acquire);
+ if state & Self::RUNNING_BIT != 0 {
+ unsafe { WHvCancelRunVirtualProcessor(self.partition_handle, 0, 0).is_ok() }
+ } else {
+ false
+ }
}
+ #[cfg(gdb)]
+ fn kill_from_debugger(&self) -> bool {
+ use windows::Win32::System::Hypervisor::WHvCancelRunVirtualProcessor;
- fn get_running(&self) -> &AtomicU64 {
- &self.running
+ self.debug_interrupt.store(true, Ordering::Relaxed);
+ // Acquire ordering to synchronize with the Release in set_running()
+ let state = self.state.load(Ordering::Acquire);
+ if state & Self::RUNNING_BIT != 0 {
+ unsafe { WHvCancelRunVirtualProcessor(self.partition_handle, 0, 0).is_ok() }
+ } else {
+ false
+ }
}
- fn get_cancel_requested(&self) -> &AtomicU64 {
- &self.cancel_requested
+ fn dropped(&self) -> bool {
+ // Acquire ordering to synchronize with the Release in set_dropped()
+ // This ensures we see all VM cleanup operations that happened before drop
+ self.dropped.load(Ordering::Acquire)
}
}
diff --git a/src/hyperlight_host/src/sandbox/initialized_multi_use.rs b/src/hyperlight_host/src/sandbox/initialized_multi_use.rs
index 2f83b1f5c..de6ef10ec 100644
--- a/src/hyperlight_host/src/sandbox/initialized_multi_use.rs
+++ b/src/hyperlight_host/src/sandbox/initialized_multi_use.rs
@@ -47,47 +47,11 @@ use crate::mem::shared_mem::HostSharedMemory;
use crate::metrics::{
METRIC_GUEST_ERROR, METRIC_GUEST_ERROR_LABEL_CODE, maybe_time_and_emit_guest_call,
};
-use crate::{Result, log_then_return, new_error};
+use crate::{Result, log_then_return};
/// Global counter for assigning unique IDs to sandboxes
static SANDBOX_ID_COUNTER: AtomicU64 = AtomicU64::new(0);
-/// RAII guard that automatically calls `clear_call_active()` when dropped.
-///
-/// This ensures that the call_active flag is always cleared when a guest function
-/// call completes, even if the function returns early due to an error.
-///
-/// Only one guard can exist per interrupt handle at a time - attempting to create
-/// a second guard will return an error.
-struct CallActiveGuard {
- interrupt_handle: Arc,
-}
-
-impl CallActiveGuard {
- /// Creates a new guard and marks a guest function call as active.
- ///
- /// # Errors
- ///
- /// Returns an error if `call_active` is already true (i.e., another guard already exists).
- fn new(interrupt_handle: Arc) -> Result {
- // Atomically check that call_active is false and set it to true.
- // This prevents creating multiple guards for the same interrupt handle.
- let was_active = interrupt_handle.set_call_active();
- if was_active {
- return Err(new_error!(
- "Attempted to create CallActiveGuard when a call is already active"
- ));
- }
- Ok(Self { interrupt_handle })
- }
-}
-
-impl Drop for CallActiveGuard {
- fn drop(&mut self) {
- self.interrupt_handle.clear_call_active();
- }
-}
-
/// A fully initialized sandbox that can execute guest functions multiple times.
///
/// Guest functions can be called repeatedly while maintaining state between calls.
@@ -611,10 +575,6 @@ impl MultiUseSandbox {
if self.poisoned {
return Err(crate::HyperlightError::PoisonedSandbox);
}
- // Mark that a guest function call is now active
- // (This also increments the generation counter internally)
- // The guard will automatically clear call_active when dropped
- let _guard = CallActiveGuard::new(self.vm.interrupt_handle())?;
let res = (|| {
let estimated_capacity = estimate_flatbuffer_capacity(function_name, &args);
diff --git a/src/hyperlight_host/tests/integration_test.rs b/src/hyperlight_host/tests/integration_test.rs
index 0d8aa271b..93504a601 100644
--- a/src/hyperlight_host/tests/integration_test.rs
+++ b/src/hyperlight_host/tests/integration_test.rs
@@ -291,7 +291,7 @@ fn interrupt_moved_sandbox() {
/// This will exercise the ABA-problem, where the vcpu could be successfully interrupted,
/// but restarted, before the interruptor-thread has a chance to see that the vcpu was killed.
///
-/// The ABA-problem is solved by introducing run-generation on the vcpu.
+/// The ABA-problem is solved by clearing CANCEL bit at the start of each VirtualCPU::run() call.
#[test]
#[cfg(target_os = "linux")]
fn interrupt_custom_signal_no_and_retry_delay() {
From 7bf68a992be0641ed0625807295ca21b0e8047ea Mon Sep 17 00:00:00 2001
From: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Date: Tue, 18 Nov 2025 16:40:13 -0800
Subject: [PATCH 2/7] PR feedback
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
---
docs/cancellation.md | 313 +++++-------------
.../src/hypervisor/hyperv_linux.rs | 4 +
.../src/hypervisor/hyperv_windows.rs | 4 +
src/hyperlight_host/src/hypervisor/kvm.rs | 4 +
src/hyperlight_host/src/hypervisor/mod.rs | 79 ++---
.../src/sandbox/initialized_multi_use.rs | 4 +
6 files changed, 134 insertions(+), 274 deletions(-)
diff --git a/docs/cancellation.md b/docs/cancellation.md
index a8b00aa99..63e6accce 100644
--- a/docs/cancellation.md
+++ b/docs/cancellation.md
@@ -19,107 +19,104 @@ The `LinuxInterruptHandle` uses a packed atomic u64 to track execution state:
- **debug_interrupt (AtomicBool)**: Set when debugger interrupt is requested (gdb feature only)
- **dropped (AtomicBool)**: Set when the corresponding VM has been dropped
-The packed state enables atomic reads of both RUNNING_BIT and CANCEL_BIT simultaneously via `get_running_and_cancel()`. Within a single `run()` call, the CANCEL_BIT remains set across vcpu exits and re-entries (such as when calling host functions), ensuring cancellation persists until the guest call completes. However, `clear_cancel()` resets the CANCEL_BIT at the beginning of each new `run()` call, preventing cancellation requests from affecting subsequent guest function calls.
+The packed state enables atomic reads of both RUNNING_BIT and CANCEL_BIT simultaneously via `get_running_and_cancel()`. Within a single `VirtualCPU::run()` call, the CANCEL_BIT remains set across vcpu exits and re-entries (such as when calling host functions), ensuring cancellation persists until the guest call completes. However, `clear_cancel()` resets the CANCEL_BIT at the beginning of each new guest function call (specifically in `MultiUseSandbox::call`, before `VirtualCPU::run()` is called), preventing cancellation requests from affecting subsequent guest function calls.
### Signal Mechanism
-On Linux, Hyperlight uses `SIGRTMIN + offset` (configurable, default offset is typically 0) to interrupt the vCPU thread. The signal handler is intentionally a no-op - the signal's only purpose is to cause a VM exit via `EINTR` from the `ioctl` call that runs the vCPU.
+On Linux, Hyperlight uses `SIGRTMIN + offset` (configurable, default offset is 0) to interrupt the vCPU thread. The signal handler is intentionally a no-op - the signal's only purpose is to cause a VM exit via `EINTR` from the `ioctl` call that runs the vCPU.
## Run Loop Flow
-The main execution loop in `HyperlightVm::run()` coordinates vCPU execution with potential interrupts. Here's the detailed flow:
+The main execution loop in `VirtualCPU::run()` coordinates vCPU execution with potential interrupts.
```mermaid
sequenceDiagram
- participant VM as run() Loop
- participant Guest as vCPU (Guest)
+ participant Caller as Caller (call())
+ participant vCPU as vCPU (run())
participant IH as InterruptHandle
- Note over VM: === TIMING POINT 1 ===
- VM->>IH: clear_cancel()
- Note right of VM: Clear premature kill() requests
+ Note over Caller: === TIMING POINT 1 ===
+ Caller->>IH: clear_cancel()
+ Note right of Caller: Start of cancellable window
+
+ Caller->>vCPU: run()
+ activate vCPU
loop Run Loop
- Note over VM: === TIMING POINT 2 ===
- VM->>IH: set_tid()
- Note right of VM: Store current thread ID
- VM->>IH: set_running()
- Note right of VM: Set running=true
+ Note over vCPU: === TIMING POINT 2 ===
+ vCPU->>IH: set_tid()
+ vCPU->>IH: set_running()
+ Note right of vCPU: Enable signal delivery
- VM->>IH: is_cancelled()
+ vCPU->>IH: is_cancelled()
alt is_cancelled() == true
- VM->>VM: return Cancelled()
+ vCPU-->>Caller: return Cancelled()
else is_cancelled() == false
- Note over VM: === TIMING POINT 3 ===
- VM->>Guest: run_vcpu()
- activate Guest
- Note right of Guest: Guest code executes in vCPU
+ Note over vCPU: === TIMING POINT 3 ===
+ vCPU->>vCPU: run_vcpu() (Enter Guest)
+ activate vCPU
alt Guest completes normally
- Guest-->>VM: VmExit::Halt()
+ vCPU-->>vCPU: VmExit::Halt()
else Guest performs I/O
- Guest-->>VM: VmExit::IoOut()/MmioRead()
+ vCPU-->>vCPU: VmExit::IoOut()/MmioRead()
else Signal received
- Guest-->>VM: VmExit::Cancelled()
+ vCPU-->>vCPU: VmExit::Cancelled()
end
- deactivate Guest
+ deactivate vCPU
end
- Note over VM: === TIMING POINT 4 ===
- VM->>IH: is_cancelled()
- IH-->>VM: cancel_requested (bool)
- Note right of VM: Capture for filtering stale signals later
+ Note over vCPU: === TIMING POINT 4 ===
+ vCPU->>IH: clear_running()
+ Note right of vCPU: Disable signal delivery
- Note over VM: === TIMING POINT 5 ===
- VM->>IH: clear_running()
- Note right of VM: Clear RUNNING_BIT
+ Note over vCPU: === TIMING POINT 5 ===
+ vCPU->>IH: is_cancelled()
+ IH-->>vCPU: cancel_requested (bool)
+ Note right of vCPU: Check if we should exit
- Note over VM: === TIMING POINT 6 ===
+ Note over vCPU: === TIMING POINT 6 ===
alt Exit reason is Halt
- VM->>VM: break Ok(())
+ vCPU-->>Caller: return Ok(())
else Exit reason is Cancelled AND cancel_requested==true
- VM->>VM: break Err(ExecutionCanceledByHost)
+ vCPU-->>Caller: return Err(ExecutionCanceledByHost)
else Exit reason is Cancelled AND cancel_requested==false
- Note right of VM: Stale signal, retry
- VM->>VM: continue (retry iteration)
+ Note right of vCPU: Stale signal, retry
+ vCPU->>vCPU: continue (retry iteration)
else Exit reason is I/O or host call
- VM->>VM: Handle and continue loop
- else Other exit reasons
- VM->>VM: Handle appropriately
+ vCPU->>vCPU: Handle and continue loop
end
end
+ deactivate vCPU
```
### Detailed Run Loop Steps
-1. **Timing Point 1** - Between guest function calls:
- - `clear_cancel()` is called to clear any stale CANCEL_BIT
- - If `kill()` completes before this point, it has NO effect on this call
- - Ensures that `kill()` between different guest function calls doesn't affect the next call
+1. **Timing Point 1** - Start of Guest Call (in `call()`):
+ - `clear_cancel()` resets the cancellation state *before* `run()` is called.
+ - Any `kill()` completed before this point is ignored.
-2. **Timing Point 2** - Before entering run loop iteration:
- - `set_tid()` stores the current thread ID
- - `set_running()` sets running to true
- - If `kill()` completes before this, early `Cancelled()` is returned
+2. **Timing Point 2** - Start of Loop Iteration:
+ - `set_running()` enables signal delivery.
+ - Checks `is_cancelled()` immediately to handle pre-run cancellation.
-3. **Timing Point 3** - Before calling `run_vcpu()`:
- - If `kill()` completes before this, CANCEL_BIT is set but too late to prevent entering guest
- - Signals will interrupt the guest (RUNNING_BIT=true), causing `VmExit::Cancelled()`
- - If guest completes before signals arrive, `kill()` may have no effect on this iteration
+3. **Timing Point 3** - Guest Entry:
+ - Enters guest execution.
+ - If `kill()` happens now, signals will interrupt the guest.
-4. **Timing Point 4** - After vCPU exits, before capturing `cancel_requested`:
- - CANCEL_BIT is captured for filtering stale signals
- - If `kill()` completes before this, CANCEL_BIT persists for next iteration
+4. **Timing Point 4** - Guest Exit:
+ - `clear_running()` disables signal delivery.
+ - Signals sent after this point are ignored.
-5. **Timing Point 5** - Before calling `clear_running()`:
- - Same as point 4
+5. **Timing Point 5** - Capture State:
+ - `is_cancelled()` captures the cancellation request state.
+ - This determines if a `Cancelled` exit was genuine or stale.
-6. **Timing Point 6** - After calling `clear_running()`:
- - RUNNING_BIT is now false, no new signals will be sent
- - CANCEL_BIT may be set but won't affect this iteration
- - Stale signals may arrive but are filtered by the `cancel_requested` check
+6. **Timing Point 6** - Handle Exit:
+ - Processes the exit reason based on the captured `cancel_requested` state.
+ - If `Cancelled` but `!cancel_requested`, it's a stale signal -> retry.
## Kill Operation Flow
@@ -130,7 +127,7 @@ sequenceDiagram
participant Caller as Caller Thread
participant IH as InterruptHandle
participant Signal as Signal Delivery
- participant VM as vCPU Thread
+ participant vCPU as vCPU Thread
Caller->>IH: kill()
activate IH
@@ -152,15 +149,15 @@ sequenceDiagram
IH->>Signal: pthread_kill(tid, SIGRTMIN+offset)
activate Signal
Note right of Signal: Send signal to vCPU thread
- Signal->>VM: SIGRTMIN+offset delivered
- Note right of VM: Signal handler is no-op Purpose is to cause EINTR
+ Signal->>vCPU: SIGRTMIN+offset delivered
+ Note right of vCPU: Signal handler is no-op Purpose is to cause EINTR
deactivate Signal
alt Signal arrives during ioctl
- VM->>VM: ioctl returns EINTR
- VM->>VM: return VmExit::Cancelled()
+ vCPU->>vCPU: ioctl returns EINTR
+ vCPU->>vCPU: return VmExit::Cancelled()
else Signal arrives between ioctls
- Note right of VM: Signal is harmless
+ Note right of vCPU: Signal is harmless
end
IH->>IH: sleep(retry_delay)
@@ -195,18 +192,7 @@ sequenceDiagram
## Memory Ordering Guarantees
-### Release-Acquire Semantics Overview
-
-A **synchronizes-with** relationship is established when:
-1. Thread A performs an atomic operation with `Release` ordering that writes a value
-2. Thread B performs an atomic operation with `Acquire` ordering on the same atomic variable
-3. Thread B's `Acquire` load reads the exact value that Thread A's `Release` operation wrote
-
-When this occurs, all memory operations that happened-before the `Release` in Thread A become visible to Thread B after the `Acquire`. This creates a **happens-before** relationship that ensures memory consistency across threads.
-
-### Synchronization in Hyperlight
-
-Hyperlight uses careful memory ordering to ensure correctness across threads:
+Hyperlight uses Release-Acquire semantics to ensure correctness across threads:
```mermaid
graph TB
@@ -215,6 +201,7 @@ graph TB
B[set_running fetch_update RUNNING_BIT with Release]
C[is_cancelled Load with Acquire]
D[clear_running fetch_and with Release]
+ J[is_debug_interrupted Load with Acquire]
end
subgraph "Interrupt Thread"
@@ -222,177 +209,43 @@ graph TB
F[send_signal Load running with Acquire]
G[Load tid with Acquire]
H[pthread_kill]
+ I[kill_from_debugger Store debug_interrupt with Release]
end
B -->|Synchronizes-with| F
A -->|Happens-before via B→F| G
E -->|Synchronizes-with| C
D -->|Synchronizes-with| F
+ I -->|Synchronizes-with| J
```
### Ordering Rules
-1. **tid Store → running Load Synchronization**:
- - `set_tid()`: Stores `tid` with `Release` ordering
- - `set_running()`: Sets RUNNING_BIT with `Release` ordering (via `fetch_or`)
- - `send_signal()`: Loads `state` with `Acquire` ordering via `get_running_and_cancel()`
- - **Guarantee**: When interrupt thread observes RUNNING_BIT=true, it sees the correct `tid` value
-
-2. **CANCEL_BIT Synchronization**:
- - `kill()`: Sets CANCEL_BIT with `Release` ordering (via `fetch_or`)
- - `is_cancelled()`: Loads `state` with `Acquire` ordering
- - **Guarantee**: When vCPU thread observes CANCEL_BIT=true, it sees all writes before `kill()`
-
-3. **clear_running Synchronization**:
- - `clear_running()`: Clears RUNNING_BIT with `Release` ordering (via `fetch_and`)
- - `send_signal()`: Loads `state` with `Acquire` ordering via `get_running_and_cancel()`
- - **Guarantee**: When interrupt thread observes RUNNING_BIT=false, all vCPU operations are complete
-
-4. **clear_cancel Synchronization**:
- - `clear_cancel()`: Clears CANCEL_BIT with `Release` ordering (via `fetch_and`)
- - **Rationale**: Uses Release because the VM can move between threads across guest calls, ensuring operations from previous run() are visible to other threads
-
-5. **dropped flag**:
- - `set_dropped()`: Uses `Release` ordering
- - `dropped()`: Uses `Acquire` ordering
- - **Guarantee**: All VM cleanup operations are visible when `dropped()` returns true
-
-### Happens-Before Relationships
-
-```mermaid
-sequenceDiagram
- participant VT as vCPU Thread
- participant IT as Interrupt Thread
-
- VT->>VT: set_tid() (Release)
- Note right of VT: Store thread ID
-
- VT->>VT: set_running() (Release)
- Note right of VT: Set running=true
-
- Note over VT,IT: Sync #1: set_running(Release) → send_signal(Acquire)
- VT-->>IT: synchronizes-with
- Note over IT: send_signal()
- IT->>IT: get_running_and_cancel() (Acquire)
- Note right of IT: Atomically load both bits Observes running=true
-
- IT->>IT: Load tid (Acquire)
- Note right of IT: Sees correct tid value (from set_tid Release)
-
- VT->>VT: run_vcpu()
- Note right of VT: Guest executes
-
- IT->>IT: pthread_kill()
- Note right of IT: Send signal to tid
-
- par Concurrent Operations
- Note over IT: kill()
- IT->>IT: fetch_or(CANCEL_BIT, Release)
- Note right of IT: Atomically set CANCEL_BIT
- and
- Note over VT: Guest interrupted
- VT->>VT: is_cancelled() (Acquire)
- Note right of VT: Observes cancel=true Sees all writes before kill()
- end
-
- Note over IT,VT: Sync #2: kill(Release) → is_cancelled(Acquire)
- IT-->>VT: synchronizes-with
-
- VT->>VT: clear_running() (Release)
- Note right of VT: fetch_and to clear RUNNING_BIT All vCPU ops complete
-
- Note over VT,IT: Sync #3: clear_running(Release) → send_signal(Acquire)
- VT-->>IT: synchronizes-with
-
- IT->>IT: send_signal() observes
- Note right of IT: running=false Stop sending signals
-```
+1. **tid Store → running Load**: `set_tid` (Release) synchronizes with `send_signal` (Acquire), ensuring the interrupt thread sees the correct thread ID.
+2. **CANCEL_BIT**: `kill` (Release) synchronizes with `is_cancelled` (Acquire), ensuring the vCPU sees the cancellation request.
+3. **clear_running**: `clear_running` (Release) synchronizes with `send_signal` (Acquire), ensuring the interrupt thread stops sending signals when the vCPU stops.
+4. **clear_cancel**: Uses Release to ensure operations from the previous run are visible to other threads.
+5. **dropped flag**: `set_dropped` (Release) synchronizes with `dropped` (Acquire), ensuring cleanup visibility.
+6. **debug_interrupt**: `kill_from_debugger` (Release) synchronizes with `is_debug_interrupted` (Acquire), ensuring the vCPU sees the debug interrupt request.
## Interaction with Host Function Calls
-When a guest performs a host function call, the vCPU exits and the host function executes with `RUNNING_BIT=false`, preventing signal delivery during host execution. The `CANCEL_BIT` persists across this exit and re-entry, so if `kill()` was called, cancellation will be detected when the guest attempts to resume execution. This ensures cancellation takes effect even if it occurs during a host call, while avoiding signals during non-guest code execution.
+When a guest performs a host function call, the vCPU exits and `RUNNING_BIT` is cleared. `CANCEL_BIT` persists, so if `kill()` is called during the host call, cancellation is detected when the guest attempts to resume.
## Signal Behavior Across Loop Iterations
-When the run loop iterates (e.g., for host calls or IO operations):
-
-1. Before host call: `clear_running()` sets `running=false`
-2. `send_signal()` loop checks `running && cancel` - exits immediately when `running=false`
-3. After host call: `set_running()` sets `running=true` again
-4. `is_cancelled()` check detects persistent `cancel` flag and returns early
-
-**Key insight**: The `running && cancel` check is sufficient. When `running` becomes false (host call starts), the signal loop exits immediately. When the vCPU would resume, the early `is_cancelled()` check catches the persistent `cancel` flag before entering the guest.
-
-**Signal Chaining Note**: Hyperlight does not provide signal chaining for `SIGRTMIN+offset`. Since Hyperlight may issue signals back-to-back during cancellation retry loop, it's unlikely embedders want to handle these signals.
-
-## Race Conditions and Edge Cases
-
-### Race 1: kill() called between guest function calls
-
-```
-Timeline:
-t1: Guest function #1 completes, run() returns
-t2: kill() is called (sets CANCEL_BIT)
-t3: Guest function #2 starts, run() is called
-t4: clear_cancel() clears CANCEL_BIT
-
-Result: Guest function #2 executes normally (not cancelled)
-```
-
-**This is by design** - cancellation is scoped to a single guest function call.
-
-### Race 2: kill() called just before run_vcpu()
-
-```
-Timeline:
-t1: set_running() sets RUNNING_BIT
-t2: kill() sets CANCEL_BIT and sends signals
-t3: run_vcpu() enters guest
-
-Result: Signals interrupt the guest, causing VmExit::Cancelled()
-```
-
-**Handled correctly** - signals cause VM exit.
-
-### Race 3: Guest completes before signal arrives
-
-```
-Timeline:
-t1: kill() sets CANCEL_BIT and sends signal
-t2: Guest completes naturally
-t3: clear_running() clears RUNNING_BIT
-t4: Signal arrives (too late)
-
-Result: If guest completes normally (Halt), returns Ok()
- If guest exits for I/O, next iteration will be cancelled
-```
-
-**Acceptable behavior** - cancellation is best-effort.
-
-### Race 4: Stale signals from previous guest function call
-
-```
-Timeline:
-t1: Guest function #1: kill() sends signals, CANCEL_BIT=true
-t2: Guest function #1: VM exits with Halt, clear_running() clears RUNNING_BIT
-t3: Guest function #2: run() called, clear_cancel() clears CANCEL_BIT
-t4: Guest function #2: set_running() sets RUNNING_BIT
-t5: Stale signal from guest #1 arrives, causes VM to exit with Cancelled
-t6: cancel_requested=false (CANCEL_BIT was cleared at step 3)
-t7: Cancelled exit is filtered as stale, iteration continues
-
-Result: The signal was sent for guest function #1, but arrives during guest function #2.
- Since cancel_requested is false, we know this cancellation wasn't intended
- for the current guest call, so we continue the loop (retry).
-```
-
-**Handled correctly** - The `cancel_requested` flag (captured at timing point 4) distinguishes between:
-- Signals intended for the current guest call (`cancel_requested=true`) → return error
-- Stale signals from a previous guest call (`cancel_requested=false`) → filter and retry
+When the run loop iterates (e.g., for host calls):
+1. `clear_running()` sets `running=false`, causing any active `send_signal()` loop to exit.
+2. `set_running()` sets `running=true` again.
+3. `is_cancelled()` detects the persistent `cancel` flag and returns early.
-### Race 5: ABA Problem
+## Race Conditions
-The ABA problem (where a new guest-call starts during the InterruptHandle's `send_signal()` loop, potentially causing the loop to send signals to a different guest call) is prevented by clearing CANCEL_BIT at the start of each `run()` call, ensuring each guest call starts with a clean cancellation state. This breaks out any ongoing slow `send_signal()` loops from previous calls that did not have time to observe the cleared CANCEL_BIT after the first `run()` call completed.
+1. **kill() between calls**: `clear_cancel()` at Timing Point 1 ensures `kill()` requests from before the current call are ignored.
+2. **kill() before run_vcpu()**: Signals interrupt the guest immediately.
+3. **Guest completes before signal**: If the guest finishes naturally, the signal is ignored or causes a retry in the next iteration (handled as stale).
+4. **Stale signals**: If a signal from a previous call arrives during a new call, `cancel_requested` (checked at Timing Point 5) will be false, causing a retry.
+5. **ABA Problem**: Clearing `CANCEL_BIT` at the start of `run()` breaks any ongoing `send_signal()` loops from previous calls.
## Windows Platform Differences
diff --git a/src/hyperlight_host/src/hypervisor/hyperv_linux.rs b/src/hyperlight_host/src/hypervisor/hyperv_linux.rs
index 0c7dbe5fe..5d61515cd 100644
--- a/src/hyperlight_host/src/hypervisor/hyperv_linux.rs
+++ b/src/hyperlight_host/src/hypervisor/hyperv_linux.rs
@@ -783,6 +783,10 @@ impl Hypervisor for HypervLinuxDriver {
self.interrupt_handle.clone()
}
+ fn clear_cancel(&self) {
+ self.interrupt_handle.clear_cancel();
+ }
+
#[cfg(crashdump)]
fn crashdump_context(&self) -> Result