virtio-dev message

Subject: Re: Constraining where a guest may allocate virtio accessible resources

From: Stefan Hajnoczi <stefanha@redhat.com>
To: Alex Bennée <alex.bennee@linaro.org>
Date: Thu, 18 Jun 2020 14:25:15 +0100

On Wed, Jun 17, 2020 at 06:31:15PM +0100, Alex Bennée wrote:
> This follows on from the discussion in the last thread I raised:
> 
>   Subject: Backend libraries for VirtIO device emulation
>   Date: Fri, 06 Mar 2020 18:33:57 +0000
>   Message-ID: <874kv15o4q.fsf@linaro.org>
> 
> To support the concept of a VirtIO backend having limited visibility of

It's unclear what we're discussing. Does "VirtIO backend" mean
vhost-user devices?

Can you describe what you are trying to do?

> a guests memory space there needs to be some mechanism to limit the
> where that guest may place things.

Or an enforcing IOMMU? In other words, an IOMMU that only gives access
to memory that has been put forth for DMA.

This was discussed recently in the context of the ongoing
vfio-over-socket work ("RFC: use VFIO over a UNIX domain socket to
implement device offloading" on qemu-devel). The idea is to use the VFIO
protocol but over UNIX domain sockets to another host userspace process
instead of over ioctls to the kernel VFIO drivers. This would allow
arbitary devices to be emulated in a separate process from QEMU. As a
first step I suggested DMA_READ/DMA_WRITE protocol messages, even though
this will have poor performance.

I think finding a solution for an enforcing IOMMU is preferrable to
guest cooperation. The problem with guest cooperation is that you may be
able to get new VIRTIO guest drivers to restrict where the virtqueues
are placed, but what about applications (e.g. O_DIRECT disk I/O, network
packets) with memory buffers at arbitrary addresses?

Modifying guest applications to honor buffer memory restrictions is too
disruptive for most use cases.

> A simple VirtIO device can be
> expressed purely in virt resources, for example:
> 
>    * status, feature and config fields
>    * notification/doorbell
>    * one or more virtqueues
> 
> Using a PCI backend the location of everything but the virtqueues it
> controlled by the mapping of the PCI device so something that is
> controllable by the host/hypervisor. However the guest is free to
> allocate the virtqueues anywhere in the virtual address space of system
> RAM.
> 
> In theory this shouldn't matter because sharing virtual pages is just a
> matter of putting the appropriate translations in place. However there
> are multiple ways the host and guest may interact:
> 
> * QEMU TCG
> 
> QEMU sees a block of system memory in it's virtual address space that
> has a one to one mapping with the guests physical address space. If QEMU
> want to share a subset of that address space it can only realistically
> do it for a contiguous region of it's address space which implies the
> guest must use a contiguous region of it's physical address space.

This paragraph doesn't reflect my understanding. There can be multiple
RAMBlocks. There isn't necessarily just 1 contiguous piece of RAM.

> 
> * QEMU KVM
> 
> The situation here is broadly the same - although both QEMU and the
> guest are seeing a their own virtual views of a linear address space
> which may well actually be a fragmented set of physical pages on the
> host.

I don't understand the "although" part. Isn't the situation the same as
with TCG, where guest physical memory ranges can cross RAMBlock
boundaries?

> 
> KVM based guests have additional constraints if they ever want to access
> real hardware in the host as you need to ensure any address accessed by
> the guest can be eventually translated into an address that can
> physically access the bus which a device in one (for device
> pass-through). The area also has to be DMA coherent so updates from a
> bus are reliably visible to software accessing the same address space.

I'm surprised about the DMA coherency sentence. Dont't VFIO and other
userspace I/O APIs provide the DMA APIs allowing applications to deal
with caches/coherency?

> 
> * Xen (and other type-1's?)
> 
> Here the situation is a little different because the guest explicitly
> makes it's pages visible to other domains by way of grant tables. The
> guest is still free to use whatever parts of its address space it wishes
> to. Other domains then request access to those pages via the hypervisor.
> 
> In theory the requester is free to map the granted pages anywhere in
> its own address space. However there are differences between the
> architectures on how well this is supported.
> 
> So I think this makes a case for having a mechanism by which the guest
> can restrict it's allocation to a specific area of the guest physical
> address space. The question is then what is the best way to inform the
> guest kernel of the limitation?

As mentioned above, I don't think it's possible to do this without
modifying applications - which is not possible in many use cases.
Instead we could improve IOMMU support so that this works transparently.

Stefan

Attachment: signature.asc
Description: PGP signature

Follow-Ups:
- Re: Constraining where a guest may allocate virtio accessible resources
  - From: Alex BennÃe <alex.bennee@linaro.org>

References:
- Constraining where a guest may allocate virtio accessible resources
  - From: Alex BennÃe <alex.bennee@linaro.org>