Skip to content

[DRAFT] I/O virtual memory (IOMMU) support #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

XanClic
Copy link

@XanClic XanClic commented May 30, 2025

Summary of the PR

This MR adds support for an IOMMU, and thus I/O virtual memory handling.

New Memory Trait: IoMemory

Handling I/O virtual memory requires a new interface to access guest memory:
GuestMemory does not allow specifying the required access permissions, which
is necessary when working with MMU-guarded memory.

We could add memory access methods with such a permissions parameter to
GuestMemory, but I prefer to provide a completely new trait instead. This
ensures that users will only use the interface that actually works when working
with (potentially) I/O virtual memory, i.e.:

  • They must always specify the required permissions,
  • They cannot (easily) directly access the memory regions, because doing so
    generally assumes that regions are long, continuous, and any address in a
    given range will be in the same memory region. This is absolutely no longer
    the case with virtual memory, which is heavily fragmented into pages.

That is, adding a new trait (IoMemory) allows to catch a lot of potential
mistakes at compile time, which I feel is much better than finding out at
runtime that some place forgot to specify the access permissions.

Unfortunately, this is an incompatible change because we need to decide on a
single guest memory trait that we expect users to primarily use: We can only
have one blanket implementation of e.g. Bytes, and this MR changes that
blanket implementation to be on IoMemory instead of GuestMemory because we
want to prefer IoMemory with its permissions-including interface.

While this MR does provide a blanket implementation of IoMemory for all
GuestMemory, Rust isn’t fully transitive here, so just because we have a
blanket impl IoMemory for GuestMemory and a blanket impl Bytes for IoMemory
doesn’t really implicitly give us an impl Bytes for GuestMemory.

What this means can be seen in virtio-queue (in vm-virtio): It uses trait bounds
like M: GuestMemory only, but then expects to be able to use the Bytes
trait. This is no longer possible, the trait bound must be extended to
M: GuestMemory + Bytes or replaced by M: IoMemory (the latter is what we
want).

Guest Address Type

Another consideration is that I originally planned to introduce new address
types. GuestAddress currently generally refers to a guest physical address
(GPA); but we now also need to deal with I/O virtual addresses (IOVAs), and an
IOMMU generally doesn’t translate those into GPAs, but VMM user space addresses
(VUAs) instead, so now there’s three kinds of addresses. Ideally, all of those
should get their own type; but I felt like:

  • This would require too many changes from our users, and
  • You don’t even know whether the address you use on an IoMemory object is an
    IOVA or a GPA. It depends on whether the IOMMU is enabled or not, which is
    generally a runtime question.

Therefore, I kept GuestAddress as the only type, and it may refer to any of
the three kinds of addresses (GPAs, IOVAs, VUAs).

Async Accesses

I was considering whether to also make memory accesses optionally async. The
vhost-user IOMMU implementation basically needs two vhost-user socket roundtrips
per IOTLB miss, which can make guest memory accesses quite slow. An async
implementation could allow mitigating that.

However, I decided against it (for now), because this would also require
extensive changes in all of our consuming crates to really be useful: Anything
that does a guest memory access should then be async.

I think if we want to add this functionality later, it should be possible in a
compatible manner.

Changes Necessary in Other Crates

vm-virtio

Implementation: https://gitlab.com/hreitz/vm-virtio/-/commits/iommu

As stated above, places that bind M: GuestMemory but expect the Bytes trait
to also be implemented need to be changed to M: GuestMemory + Bytes or
M: IoMemory. I opted for the latter approach, and basically replaced all
GuestMemory instances by IoMemory.

(That is what we want because dropping GuestMemory in favor of IoMemory
ensures that all vm-virtio crates can work with virtual memory.)

vhost

Implementation: https://gitlab.com/hreitz/vhost/-/commits/iommu

Here, the changes that updating vm-memory necessitates are quite marginal, and
have a similar cause: But instead of requiring the Bytes trait, it’s the
GuestAddressSpace trait. The resolution is the same: Switch from requiring
GuestMemory to IoMemory.

The rest of the commits concerns itself with implementing VhostUserIommu and
allowing users to choose to use IommuMemory<GuestMemoryMmap, VhostUserIommu>
instead of only GuestMemoryMmap.

virtiofsd (as one user)

Implementation: https://gitlab.com/hreitz/virtiofsd-rs/-/commits/iommu

This is an example of an actual user. Updating all crates to IOMMU-supporting
versions actually does not require any changes to the code, but enabling the
'iommu' feature does: This feature makes the vhost-user-backend crate require
the VhostUserBackend::Memory associated type (because associated type defaults
are not stable yet), so this single line of code must be added (which sets the
type to GuestMemoryMmap<BitmapMmapRegion>).

Actually enabling IOMMU support is then a bit more involved, as it requires
switching away from GuestMemoryMmap to IommuMemory again.

However, to me, this shows that end users working with concrete types do not
seem to be affected by the incompatible IoMemory change until they want to opt
in to it. That’s because GuestMemoryMmap implements both GuestMemory and
IoMemory (thanks to the blanket impl), so can transparently be used wherever
the updated crates expect to see an IoMemory type.

Why a Draft?

I did not write unit tests yet. I assume the design will be something that will
need some discussion, so I wanted to publish my state before I fully finalize
it.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test. (not done, which is why this is a draft)
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

XanClic added 9 commits May 30, 2025 11:39
With virtual memory, seemingly consecutive I/O virtual memory regions
may actually be fragmented across multiple pages in our userspace
mapping.  Existing `descriptor_utils::Reader::new()` (and `Writer`)
implementations (e.g. in virtiofsd or vm-virtio/virtio-queue) use
`GuestMemory::get_slice()` to turn guest memory address ranges into
valid slices in our address space; but with this fragmentation, it is
easily possible that a range no longer corresponds to a single slice.

To fix this, we can instead use `try_access()` to collect all slices,
but to do so, its region argument needs to have the correct lifetime so
we can collect the slices into a `Vec<_>` outside of the closure.

Signed-off-by: Hanna Czenczek <[email protected]>
read() and write() must not ignore the `count` parameter: The mappings
passed into the `try_access()` closure are only valid for up to `count`
bytes, not more.

Signed-off-by: Hanna Czenczek <[email protected]>
When we switch to a (potentially) virtual memory model, we want to
compact the interface, especially removing references to memory regions
because virtual memory is not just split into regions, but pages first.

The one memory-region-referencing part we are going to keep is
`try_access()` because that method is nicely structured around the
fragmentation we will have to accept when it comes to paged memory.

`to_region_addr()` in contrast does not even take a length argument, so
for virtual memory, using the returned region and address is unsafe if
doing so crosses page boundaries.

Therefore, switch `Bytes::load()` and `store()` from using
`to_region_addr()` to `try_access()`.

Signed-off-by: Hanna Czenczek <[email protected]>
The existing `GuestMemory` trait is insufficient for representing
virtual memory, as it does not allow specifying the required access
permissions.

Its focus on all guest memory implementations consisting of a relatively
small number of regions is also unsuited for paged virtual memory with a
potentially very lage set of non-continuous mappings.

The new `IoMemory` trait in contrast provides only a small number of
methods that keep the implementing type’s internal structure more
opaque, and every access needs to be accompanied by the required
permissions.

Signed-off-by: Hanna Czenczek <[email protected]>
Rust only allows us to give one trait the blanket implementations for
`Bytes` and `GuestAddressSpace`.

We want `IoMemory` to be our primary external interface becaue it has
users specify the access permissions they need, and because we can (and
do) provide a blanket `IoMemory` implementation for all `GuestMemory`
types.

Therefore, replace requirements of `GuestMemory` by `IoMemory` instead.

Signed-off-by: Hanna Czenczek <[email protected]>
The Iommu trait defines an interface for translating virtual addresses
into addresses in an underlying address space.

It is supposed to do so by internally keeping an instance of the Iotlb
type, updating it with mappings whenever necessary (e.g. when
actively invalidated or when there’s an access failure) from some
internal data source (e.g. for a vhost-user IOMMU, the data comes from
the vhost-user front-end by requesting an update).

In a later commit, we are going to provide an implementation of
`IoMemory` that can use an `Iommu` to provide an I/O virtual address
space.

Note that while I/O virtual memory in practice will be organized in
pages, the vhost-user specification makes no mention of a specific page
size or how to obtain it.  Therefore, we cannot really assume any page
size and have to use plain ranges with byte granularity as mappings
instead.

Signed-off-by: Hanna Czenczek <[email protected]>
This `IoMemory` type provides an I/O virtual address space by adding an
IOMMU translation layer to an underlying `GuestMemory` object.

Signed-off-by: Hanna Czenczek <[email protected]>
The vhost-user-backend crate will need to be able to modify all existing
memory regions to use the VMM user address instead of the guest physical
address once the IOMMU feature is switched on, and vice versa.  To do
so, it needs to be able to modify regions’ base address.

Because `GuestMemoryMmap` stores regions wrapped in an `Arc<_>`, we
cannot mutate them after they have been put into the `GuestMemoryMmap`
object; and `MmapRegion` itself is by its nature not clonable.  So to
modify the regions’ base addresses, we need some way to create a new
`GuestRegionMmap` referencing the same `MmapRegion` as another one, but
with a different base address.

We can do that by having `GuestRegionMmap` wrap its `MmapRegion` in an
`Arc`, and adding a method to return a reference to that `Arc`, and a
method to construct a `GuestRegionMmap` object from such a cloned `Arc.`

Signed-off-by: Hanna Czenczek <[email protected]>
Document in DESIGN.md how I/O virtual memory is handled.

Signed-off-by: Hanna Czenczek <[email protected]>
@XanClic
Copy link
Author

XanClic commented May 30, 2025

I know why the code coverage CI check fails, because I (purposefully) don’t have unit tests for the new code.

Why the other tests failed, I don’t know; but I suspect it’s because I force-pushed an update (fixing the CHANGELOG.md link) while the tests where running, maybe SIGTERM-ing them. Looking at the timelines, they all failed (finished) between 16:27:02.985 and 16:27:02.995. (Except the coverage one, which is an actual failure.)

@XanClic
Copy link
Author

XanClic commented May 30, 2025

Pushed an update without actual changes (just re-committing the top commit) to trigger a CI re-run. This time, only the coverage check failed (as expected).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant