virtio-comment message

Subject: Re: [virtio-comment] [PATCH] Introduce VIRTIO_F_ISOLATE_INDIRECT_DESC feature
From: "Afsa, Baptiste" <Baptiste.Afsa@harman.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Date: Mon, 27 Feb 2023 14:53:55 +0000
> Hi Baptiste,
>
> It is not the current way of working for shadow virtqueue in qemu, but
> it is doable.
>
> QEMU offers a new address space to the device, including all the guest
> memory and the shadow vrings in qemu memory. By default, qemu
> translates each descriptor to this new address space and copies it in
> the shadow vring, making sure it's a valid guest address.
>
> Net CVQ is a special case, as qemu copies also the buffer so a
> malicious guest will not be able to change it and make vdpa device and
> qemu's knowledge of it out of sync. This is done in a region mapped at
> device start for this purpose for performance and simplicity, not for
> each descriptor.
>
> It is technically possible to develop the copy of the buffer content
> in a memory region too. Once a suitable size is agreed (or given by a
> parameter in cmdline, for example), SVQ already supports things like
> backoff in case this region is full. To malloc and map this region
> dynamically is also possible.
>

Hello Eugenio,

Thanks for the insight. It does have some similarities with what we have done on
our side.

We have also tried the approach of doing a copy of the buffer content into a
shared memory region but we did that on the guest side using things like the
swiotlb or restricted DMA pools. This works well when buffers are small, but
when they get larger, we get better performance by granting the memory
dynamically.

> > This new feature bit helps supporting indirect
> > descriptors in this context.
> >
> > When an indirect descriptor is added to a virtqueue, we need to "share" the
> > indirect descriptor table in addition to the buffers themselves. To do this we
> > have two options:
> >
> >   - Share a shadow copy of the table with the device but this requires some kind
> >     of dynamic memory allocation in the hypervisor.
> >
> >   - Grant the indirect descriptor table from the guest as-is. Note that in our
> >     use case we do not translate buffer addresses when importing them into the
> >     device address space.
> >
>
> The plan with indirect descriptors is similar to CVQ buffer treatment:
> To preallocate a big enough chunk of memory able to hold a copy of the
> indirect table and then translate each descriptor. The current dynamic
> allocation scheme is very simple but it should work as long as there
> is enough memory. As commented before, backoff is already available
> for other cases.

Ok. This is one of the options we considered initially, but that we rejected
because we wanted to avoid dynamic memory allocation.

However, we are reconsidering this idea now. We initially overlooked the fact
that the spec limits the size of an indirect descriptor table to the queue
size. Having this allows us to figure out the maximum size for a single indirect
descriptor table and the total size needed for the memory pool that will store
the copies of the indirect descriptor tables.

The issue that we have now, is that this limitation does not seem to be enforced
in Linux virtio drivers today. I came across:

https://github.com/oasis-tcs/virtio-spec/issues/122

which looks like a good base for us to build upon, but I'm not sure what is the
status with this issue. Do you know whether there is any plan regarding this?

> Going a step forward with current qemu's SVQ, a feature that would
> help is to allow the device to translate buffers addresses with a
> different IOVA from vring and indirect table. But this is far from
> standard POV, since it does not have even an address space concept.

I'm not sure to see what this would bring compared to the current design.

> Do you think both modes can converge?

I think so since we are now thinking of using a copy of the indirect descriptor
table. This would eliminate the need for any additional feature bit.

> > The issue with the second option is that we need to ensure that the driver will
> > not modify the indirect descriptor table while it is on the device side.
> > Otherwise the device could attempt to access the buffers using the new addresses
> > which would not correspond to the one the hypervisor used when granting memory.
> >
> > Since this is something that a correct driver implementation would not do, this
> > led to the idea of remapping the table read-only while it is used by the device.
> > Because an indirect descriptor table may not fill an entire page, we needed a
> > way to ensure that the OS would not re-use the rest of these pages for other
> > purposes while we have made them read-only.
> >
>
> How do you guarantee it for vring itself?

For the vrings we do exactly what you described for QEMU. The virtqueues that
the device sees are not the virtqueues allocated by the driver but the shadow
copies managed by the hypervisor. If the driver touches the descriptors after
they have been copied to the shadow virtqueue, the hypervisor will ignore these
changes.

Thanks.
Follow-Ups:
- Re: [virtio-comment] [PATCH] Introduce VIRTIO_F_ISOLATE_INDIRECT_DESC feature
  - From: Stefan Hajnoczi <stefanha@redhat.com>