virtio-dev message

Subject: Re: [virtio-dev] VM memory protection and zero-copy transfers.
From: "Afsa, Baptiste" <Baptiste.Afsa@harman.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Date: Fri, 15 Jul 2022 14:18:32 +0000
> Did you try virtio-pci with an IOMMU? The advantage compared to both
> your proposal and swiotlb is that workloads that reuse buffers have no
> performance overhead because the IOMMU mappings remain in place across
> virtqueue requests.
>
> I have CCed Jean-Philippe Brucker <jean-philippe@linaro.org> who
> designed the virtio-iommu device.
>
> Using an IOMMU can be slower than the approach you are proposing when
> each request requires new mappings. That's because your approach
> combines the virtqueue kick processing with the page granting whereas
> programming an IOMMU with map/unmap commands is a separate vmexit from
> the virtqueue kick. It's probably easier to make your approach faster in
> the dynamic mappings case for this reason.
>
> A page-table based IOMMU (doesn't require explicit map/unamp commands
> because it reads mappings on-demand from a page table structure) might
> perform better than one that needs to be programmed for each each
> map/unmap operation. It still needs a kick (vmexit) for invalidation but
> it might be possible for a design of this type to avoid vmexits in the
> common case.

We considered it at some point with the idea of moving the memory granting to
the driver's side but we did not go in this direction for the reason that you
mentioned, plus the fact that doing it from the hypervisor allows us to validate
the descriptors while granting the memory and moving them to the shadow queue.

However, we did not consider workloads where the buffers are reused. I am
curious to see how much this is used in practice and the sort of gain we are
looking at.

Thank you for pointing this out, this looks like something we should investigate
further.

> > Although the shadow virtqueue concept looks fairly simple, there is still one
> > point that has not been covered yet: indirect descriptors.
> >
> > To support indirect descriptors, the following two options were considered
> > initially:
> >
> >   1. Grant the indirect descriptor as-is to the host while it is on the used
> >      ring. This introduces a security issue because a compromised guest OS can
> >      modify the indirect descriptor after it has been pushed to the available
> >      ring. This would cause the device to fault while trying to access any
> >      arbitrary memory that was not actually granted.
> >
> >      Note that in the shadow virtqueue model, there is no need for the device to
> >      validate the descriptors in the available rings, because the hypervisor
> >      already performed such checks before granting the memory.
>
> Assuming that the driver can trust the device isn't possible in all use
> cases. Hardware VIRTIO device implementations, VDUSE
> (https://docs.kernel.org/userspace-api/vduse.html), and Confidential
> Computing are 3 use cases where the device is untrusted. If you make the
> assumption then it's important to clearly mark the code so it won't be
> reused in a context where that would be a security problem.
>
> >
> >   2. Follow the same logic that is used for the "normal" descriptors and
> >      introduce shadow indirect descriptors. This would require the hypervisor to
> >      provision a memory pool to allocate these shadow indirect descriptors and
> >      determining the size of this pool may not be trivial.
> >
> >      Additionally, indirect descriptors can be as large as the driver wants them
> >      to be, something that can cause the hypervisor to copy an arbitrary large
> >      amount of data.
>
> I agree that it's unfortunate that indirect descriptors would require
> some kind of dynamic memory in the hypervisor. However, the statement
> about indirect descriptor size is incorrect. They are limited by Queue
> Size:
>
>   VIRTIO 1.2 2.7.5.3.1 Driver Requirements: Indirect Descriptors
>
>   A driver MUST NOT create a descriptor chain longer than the Queue Size
>   of the device.
>
> >
> > An alternative approach consists in introducing a new virtio feature bit. This
> > feature bit, when set by the device, instructs the driver to allocate indirect
> > descriptors using dedicated memory pages. These pages shall hold no other data
> > than the indirect descriptors. Since a correct virtio driver implementation does
> > not modify an indirect descriptor once it has been pushed to the device, the
> > pages where the indirect descriptors lies can later on be remapped as read-only
> > in the guest address space.
> >
> > This allows the hypervisor to validate the content of the indirect descriptor,
> > grant it to the host (along with all the buffers referenced by this descriptor)
> > and remap the indirect descriptor read-only in the guest address space as long
> > as it is granted to the host (i.e. until the indirect descriptor is returned
> > through the used ring).
>
> That sounds very slow (2 page table updates per request).

Yes it is. As I mentioned, overall this approach is about 40% slower on the set
of benchmarks we used when compared to running virtio devices without memory
isolation.

> > The present proposal has some obvious drawbacks, but we believe that memory
> > protection will not come for free. We know that there are other folks out there
> > which try to address this issue of memory sharing between VMs, so we would be
> > pleased to hear what you guys think about this approach.
> >
> > Additionally, we would like to know whether a feature bit similar to the one
> > that was discussed here could be considered for addition to the virtio standard.
>
> Memory isolation is hard to do efficiently. It would be great to discuss
> your proposal more with the VIRTIO community and then send a spec patch
> for detailed review and voting.

This is exactly why we are here!

I can indeed prepare a spec patch but before doing that I wanted to get some
feedback on this approach and see if anyone in the community would see any
reason not to introduce such a feature bit.

> One thing I didn't see in your proposal was a copying vs zero-copy
> threshold. Maybe it helps to look at the size of requests and copy data
> instead of granting pages when descriptors are small? On the other hand,
> a 4 KB page size means that many descriptors won't be larger than 4 KB
> anyway due to guest physical memory fragmentation. This is basically
> hybrid of swiotlb and your proposal - zero-copy when it pays off,
> copying when it's cheap.
>
> As I mentioned, I think IOMMUs are worth investigating, in particular
> for the case where mappings are rarely changed. They are fast in that
> case.

I agree there is much likely a point until which copying is cheaper. However, we
have not considered this in our initial investigation.

Thank you Stefan for taking the time to read this proposal and for your
feedback.

Baptiste
Follow-Ups:
- Re: [virtio-dev] VM memory protection and zero-copy transfers.
  - From: Stefan Hajnoczi <stefanha@redhat.com>
References:
- VM memory protection and zero-copy transfers.
  - From: "Afsa, Baptiste" <Baptiste.Afsa@harman.com>
- Re: [virtio-dev] VM memory protection and zero-copy transfers.
  - From: Stefan Hajnoczi <stefanha@redhat.com>