-
Notifications
You must be signed in to change notification settings - Fork 109
[DRAFT] I/O virtual memory (IOMMU) support #327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
With virtual memory, seemingly consecutive I/O virtual memory regions may actually be fragmented across multiple pages in our userspace mapping. Existing `descriptor_utils::Reader::new()` (and `Writer`) implementations (e.g. in virtiofsd or vm-virtio/virtio-queue) use `GuestMemory::get_slice()` to turn guest memory address ranges into valid slices in our address space; but with this fragmentation, it is easily possible that a range no longer corresponds to a single slice. To fix this, we can instead use `try_access()` to collect all slices, but to do so, its region argument needs to have the correct lifetime so we can collect the slices into a `Vec<_>` outside of the closure. Signed-off-by: Hanna Czenczek <[email protected]>
read() and write() must not ignore the `count` parameter: The mappings passed into the `try_access()` closure are only valid for up to `count` bytes, not more. Signed-off-by: Hanna Czenczek <[email protected]>
When we switch to a (potentially) virtual memory model, we want to compact the interface, especially removing references to memory regions because virtual memory is not just split into regions, but pages first. The one memory-region-referencing part we are going to keep is `try_access()` because that method is nicely structured around the fragmentation we will have to accept when it comes to paged memory. `to_region_addr()` in contrast does not even take a length argument, so for virtual memory, using the returned region and address is unsafe if doing so crosses page boundaries. Therefore, switch `Bytes::load()` and `store()` from using `to_region_addr()` to `try_access()`. Signed-off-by: Hanna Czenczek <[email protected]>
The existing `GuestMemory` trait is insufficient for representing virtual memory, as it does not allow specifying the required access permissions. Its focus on all guest memory implementations consisting of a relatively small number of regions is also unsuited for paged virtual memory with a potentially very lage set of non-continuous mappings. The new `IoMemory` trait in contrast provides only a small number of methods that keep the implementing type’s internal structure more opaque, and every access needs to be accompanied by the required permissions. Signed-off-by: Hanna Czenczek <[email protected]>
Rust only allows us to give one trait the blanket implementations for `Bytes` and `GuestAddressSpace`. We want `IoMemory` to be our primary external interface becaue it has users specify the access permissions they need, and because we can (and do) provide a blanket `IoMemory` implementation for all `GuestMemory` types. Therefore, replace requirements of `GuestMemory` by `IoMemory` instead. Signed-off-by: Hanna Czenczek <[email protected]>
The Iommu trait defines an interface for translating virtual addresses into addresses in an underlying address space. It is supposed to do so by internally keeping an instance of the Iotlb type, updating it with mappings whenever necessary (e.g. when actively invalidated or when there’s an access failure) from some internal data source (e.g. for a vhost-user IOMMU, the data comes from the vhost-user front-end by requesting an update). In a later commit, we are going to provide an implementation of `IoMemory` that can use an `Iommu` to provide an I/O virtual address space. Note that while I/O virtual memory in practice will be organized in pages, the vhost-user specification makes no mention of a specific page size or how to obtain it. Therefore, we cannot really assume any page size and have to use plain ranges with byte granularity as mappings instead. Signed-off-by: Hanna Czenczek <[email protected]>
This `IoMemory` type provides an I/O virtual address space by adding an IOMMU translation layer to an underlying `GuestMemory` object. Signed-off-by: Hanna Czenczek <[email protected]>
The vhost-user-backend crate will need to be able to modify all existing memory regions to use the VMM user address instead of the guest physical address once the IOMMU feature is switched on, and vice versa. To do so, it needs to be able to modify regions’ base address. Because `GuestMemoryMmap` stores regions wrapped in an `Arc<_>`, we cannot mutate them after they have been put into the `GuestMemoryMmap` object; and `MmapRegion` itself is by its nature not clonable. So to modify the regions’ base addresses, we need some way to create a new `GuestRegionMmap` referencing the same `MmapRegion` as another one, but with a different base address. We can do that by having `GuestRegionMmap` wrap its `MmapRegion` in an `Arc`, and adding a method to return a reference to that `Arc`, and a method to construct a `GuestRegionMmap` object from such a cloned `Arc.` Signed-off-by: Hanna Czenczek <[email protected]>
Document in DESIGN.md how I/O virtual memory is handled. Signed-off-by: Hanna Czenczek <[email protected]>
I know why the code coverage CI check fails, because I (purposefully) don’t have unit tests for the new code. Why the other tests failed, I don’t know; but I suspect it’s because I force-pushed an update (fixing the CHANGELOG.md link) while the tests where running, maybe SIGTERM-ing them. Looking at the timelines, they all failed (finished) between 16:27:02.985 and 16:27:02.995. (Except the coverage one, which is an actual failure.) |
Signed-off-by: Hanna Czenczek <[email protected]>
Pushed an update without actual changes (just re-committing the top commit) to trigger a CI re-run. This time, only the coverage check failed (as expected). |
Summary of the PR
This MR adds support for an IOMMU, and thus I/O virtual memory handling.
New Memory Trait:
IoMemory
Handling I/O virtual memory requires a new interface to access guest memory:
GuestMemory
does not allow specifying the required access permissions, whichis necessary when working with MMU-guarded memory.
We could add memory access methods with such a permissions parameter to
GuestMemory
, but I prefer to provide a completely new trait instead. Thisensures that users will only use the interface that actually works when working
with (potentially) I/O virtual memory, i.e.:
generally assumes that regions are long, continuous, and any address in a
given range will be in the same memory region. This is absolutely no longer
the case with virtual memory, which is heavily fragmented into pages.
That is, adding a new trait (
IoMemory
) allows to catch a lot of potentialmistakes at compile time, which I feel is much better than finding out at
runtime that some place forgot to specify the access permissions.
Unfortunately, this is an incompatible change because we need to decide on a
single guest memory trait that we expect users to primarily use: We can only
have one blanket implementation of e.g.
Bytes
, and this MR changes thatblanket implementation to be on
IoMemory
instead ofGuestMemory
because wewant to prefer
IoMemory
with its permissions-including interface.While this MR does provide a blanket implementation of
IoMemory
for allGuestMemory
, Rust isn’t fully transitive here, so just because we have ablanket
impl IoMemory for GuestMemory
and a blanketimpl Bytes for IoMemory
doesn’t really implicitly give us an
impl Bytes for GuestMemory
.What this means can be seen in virtio-queue (in vm-virtio): It uses trait bounds
like
M: GuestMemory
only, but then expects to be able to use theBytes
trait. This is no longer possible, the trait bound must be extended to
M: GuestMemory + Bytes
or replaced byM: IoMemory
(the latter is what wewant).
Guest Address Type
Another consideration is that I originally planned to introduce new address
types.
GuestAddress
currently generally refers to a guest physical address(GPA); but we now also need to deal with I/O virtual addresses (IOVAs), and an
IOMMU generally doesn’t translate those into GPAs, but VMM user space addresses
(VUAs) instead, so now there’s three kinds of addresses. Ideally, all of those
should get their own type; but I felt like:
IoMemory
object is anIOVA or a GPA. It depends on whether the IOMMU is enabled or not, which is
generally a runtime question.
Therefore, I kept
GuestAddress
as the only type, and it may refer to any ofthe three kinds of addresses (GPAs, IOVAs, VUAs).
Async Accesses
I was considering whether to also make memory accesses optionally
async
. Thevhost-user IOMMU implementation basically needs two vhost-user socket roundtrips
per IOTLB miss, which can make guest memory accesses quite slow. An
async
implementation could allow mitigating that.
However, I decided against it (for now), because this would also require
extensive changes in all of our consuming crates to really be useful: Anything
that does a guest memory access should then be
async
.I think if we want to add this functionality later, it should be possible in a
compatible manner.
Changes Necessary in Other Crates
vm-virtio
Implementation: https://gitlab.com/hreitz/vm-virtio/-/commits/iommu
As stated above, places that bind
M: GuestMemory
but expect theBytes
traitto also be implemented need to be changed to
M: GuestMemory + Bytes
orM: IoMemory
. I opted for the latter approach, and basically replaced allGuestMemory
instances byIoMemory
.(That is what we want because dropping
GuestMemory
in favor ofIoMemory
ensures that all vm-virtio crates can work with virtual memory.)
vhost
Implementation: https://gitlab.com/hreitz/vhost/-/commits/iommu
Here, the changes that updating vm-memory necessitates are quite marginal, and
have a similar cause: But instead of requiring the
Bytes
trait, it’s theGuestAddressSpace
trait. The resolution is the same: Switch from requiringGuestMemory
toIoMemory
.The rest of the commits concerns itself with implementing
VhostUserIommu
andallowing users to choose to use
IommuMemory<GuestMemoryMmap, VhostUserIommu>
instead of only
GuestMemoryMmap
.virtiofsd (as one user)
Implementation: https://gitlab.com/hreitz/virtiofsd-rs/-/commits/iommu
This is an example of an actual user. Updating all crates to IOMMU-supporting
versions actually does not require any changes to the code, but enabling the
'iommu' feature does: This feature makes the vhost-user-backend crate require
the
VhostUserBackend::Memory
associated type (because associated type defaultsare not stable yet), so this single line of code must be added (which sets the
type to
GuestMemoryMmap<BitmapMmapRegion>
).Actually enabling IOMMU support is then a bit more involved, as it requires
switching away from
GuestMemoryMmap
toIommuMemory
again.However, to me, this shows that end users working with concrete types do not
seem to be affected by the incompatible
IoMemory
change until they want to optin to it. That’s because
GuestMemoryMmap
implements bothGuestMemory
andIoMemory
(thanks to the blanket impl), so can transparently be used whereverthe updated crates expect to see an
IoMemory
type.Why a Draft?
I did not write unit tests yet. I assume the design will be something that will
need some discussion, so I wanted to publish my state before I fully finalize
it.
Requirements
Before submitting your PR, please make sure you addressed the following
requirements:
git commit -s
), and the commit message has max 60 characters for thesummary and max 75 characters for each description line.
test. (not done, which is why this is a draft)
Release" section of CHANGELOG.md (if no such section exists, please create one).
unsafe
code is properly documented.