OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH] Introduce VIRTIO_F_ISOLATE_INDIRECT_DESC feature


On Thu, Mar 02, 2023 at 05:51:44PM +0100, Eugenio Perez Martin wrote:
> On Mon, Feb 27, 2023 at 3:54 PM Afsa, Baptiste <Baptiste.Afsa@harman.com> wrote:
> >
> > > Hi Baptiste,
> > >
> > > It is not the current way of working for shadow virtqueue in qemu, but
> > > it is doable.
> > >
> > > QEMU offers a new address space to the device, including all the guest
> > > memory and the shadow vrings in qemu memory. By default, qemu
> > > translates each descriptor to this new address space and copies it in
> > > the shadow vring, making sure it's a valid guest address.
> > >
> > > Net CVQ is a special case, as qemu copies also the buffer so a
> > > malicious guest will not be able to change it and make vdpa device and
> > > qemu's knowledge of it out of sync. This is done in a region mapped at
> > > device start for this purpose for performance and simplicity, not for
> > > each descriptor.
> > >
> > > It is technically possible to develop the copy of the buffer content
> > > in a memory region too. Once a suitable size is agreed (or given by a
> > > parameter in cmdline, for example), SVQ already supports things like
> > > backoff in case this region is full. To malloc and map this region
> > > dynamically is also possible.
> > >
> >
> > Hello Eugenio,
> >
> > Thanks for the insight. It does have some similarities with what we have done on
> > our side.
> >
> > We have also tried the approach of doing a copy of the buffer content into a
> > shared memory region but we did that on the guest side using things like the
> > swiotlb or restricted DMA pools. This works well when buffers are small, but
> > when they get larger, we get better performance by granting the memory
> > dynamically.
> >
> > > > This new feature bit helps supporting indirect
> > > > descriptors in this context.
> > > >
> > > > When an indirect descriptor is added to a virtqueue, we need to "share" the
> > > > indirect descriptor table in addition to the buffers themselves. To do this we
> > > > have two options:
> > > >
> > > >   - Share a shadow copy of the table with the device but this requires some kind
> > > >     of dynamic memory allocation in the hypervisor.
> > > >
> > > >   - Grant the indirect descriptor table from the guest as-is. Note that in our
> > > >     use case we do not translate buffer addresses when importing them into the
> > > >     device address space.
> > > >
> > >
> > > The plan with indirect descriptors is similar to CVQ buffer treatment:
> > > To preallocate a big enough chunk of memory able to hold a copy of the
> > > indirect table and then translate each descriptor. The current dynamic
> > > allocation scheme is very simple but it should work as long as there
> > > is enough memory. As commented before, backoff is already available
> > > for other cases.
> >
> > Ok. This is one of the options we considered initially, but that we rejected
> > because we wanted to avoid dynamic memory allocation.
> >
> > However, we are reconsidering this idea now. We initially overlooked the fact
> > that the spec limits the size of an indirect descriptor table to the queue
> > size. Having this allows us to figure out the maximum size for a single indirect
> > descriptor table and the total size needed for the memory pool that will store
> > the copies of the indirect descriptor tables.
> >
> > The issue that we have now, is that this limitation does not seem to be enforced
> > in Linux virtio drivers today. I came across:
> >
> > https://github.com/oasis-tcs/virtio-spec/issues/122
> >
> > which looks like a good base for us to build upon, but I'm not sure what is the
> > status with this issue. Do you know whether there is any plan regarding this?
> >
> 
> I was not aware of that lack of enforcement as -net already uses
> MAX_SKB_FRAGS. Is it actually possible for a Linux driver to send an
> indirect table with more than 256 indirect descriptors?
> 
> When SVQ implements indirect in qemu it may make sense to assume each
> indirect table may hold 1024 indirect descriptors max. If I'm not
> wrong it is a limit already enforced by qemu.

Yes, QEMU's hw/virtio code is limited to VIRTQUEUE_MAX_SIZE (1024)
iovecs. This limit ultimately comes from the kernel's IOV_MAX. There's
not much userspace can do with the iovecs if they cannot be passed to
the kernel.

> 
> Another useful property of SVQ is that it does not strictly follow the
> guest's avail_idx. It may make available some descriptors / indirect
> tables, and then wait for completion. It then disables guest's kick
> notification, and will wait for the device to make any of them used.
> When there is space, it will make available more descriptors / tables.
> 
> That property makes it not mandatory to allocate the worst case memory
> for all indirect tables, only to some of them. The right value is
> still to be agreed though.
> 
> Would that help your use case?
> 
> > > Going a step forward with current qemu's SVQ, a feature that would
> > > help is to allow the device to translate buffers addresses with a
> > > different IOVA from vring and indirect table. But this is far from
> > > standard POV, since it does not have even an address space concept.
> >
> > I'm not sure to see what this would bring compared to the current design.
> >
> 
> To avoid the need of modifying devices, SVQ vrings are offered as
> regulars vring to the device. All buffer address must be translated by
> qemu because of this. In the indirect table case, this multiplies the
> number of translations.
> 
> If we can make the device to interpret the vring address one way and
> buffers / indirect table addresses another, this translation is
> actually a memcpy.
> 
> Since you mention later in your mail that you're actually shadowing
> the vrings, how do you deal with this? Do you translate each
> descriptor or do you instruct the device to translate addresses
> differently?
> 
> Thanks!
> 
> > > Do you think both modes can converge?
> >
> > I think so since we are now thinking of using a copy of the indirect descriptor
> > table. This would eliminate the need for any additional feature bit.
> >
> > > > The issue with the second option is that we need to ensure that the driver will
> > > > not modify the indirect descriptor table while it is on the device side.
> > > > Otherwise the device could attempt to access the buffers using the new addresses
> > > > which would not correspond to the one the hypervisor used when granting memory.
> > > >
> > > > Since this is something that a correct driver implementation would not do, this
> > > > led to the idea of remapping the table read-only while it is used by the device.
> > > > Because an indirect descriptor table may not fill an entire page, we needed a
> > > > way to ensure that the OS would not re-use the rest of these pages for other
> > > > purposes while we have made them read-only.
> > > >
> > >
> > > How do you guarantee it for vring itself?
> >
> > For the vrings we do exactly what you described for QEMU. The virtqueues that
> > the device sees are not the virtqueues allocated by the driver but the shadow
> > copies managed by the hypervisor. If the driver touches the descriptors after
> > they have been copied to the shadow virtqueue, the hypervisor will ignore these
> > changes.
> >
> > Thanks.
> >
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 

Attachment: signature.asc
Description: PGP signature



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]