OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout


On Thu, Aug 24, 2017 at 02:10:34PM +0200, Paolo Bonzini wrote:
> On 24/08/2017 13:53, Tiwei Bie wrote:
> > 
> > * In addition to the DESC_HW flag, each virtio queue has a tail pointer
> >     - Driver creates suitable (i.e. multiple of cacheline) descriptors,
> >       then performs MMIO write to tail pointer.
> 
> If I understand correctly, the tail pointer is the value that is written
> to the MMIO register.  If that is the case, this is unfortunately bad
> for virtualization.  Virt prefers a doorbell register where the value
> doesn't matter.  This is because:
> 
> 1) the value is not available directly and computing it requires
> instruction decoding, which in turn requires walking page tables
> 
> 2) if the value doesn't matter, the hypervisor can simply wake up a
> userspace thread that processes the virtio queue without bothering to
> pass the value.
> 
> On the other hand, writing a tail pointer _before_ the MMIO write may
> cost a cache miss.  Hence the packed ring layout proposal replaced the
> tail pointer write with lookahead on the ring buffer's DESC_HW flags.
> The idea is that lookahead is cheaper, because hopefully the first
> non-DESC_HW buffer will be in the same cache line as the last DESC_HW
> buffer.
> 

Thank you so much for such quick and detailed reply!

Yeah, we know it's a bit tricky to support the tail pointer in
software. But it's really helpful for the hardware implementation.
So we want more discussions on this.

How about having this feature be switchable at runtime, so it's
possible to be enabled after migrating to a hardware backend, or
disabled after migrating to a software backend. So for the software
backend, it can still use the DESC_HW based mechanism.

It's just some rough thoughts, and we haven't thought about the
implementation details. What's your thoughts on this?

> 
> > Indirect Chaining
> > =================
> > 
> > ## Current proposal
> > 
> > * Indirect chaining is an optional feature
> > 
> > ## New proposal
> > 
> > * Remove this feature from this new ring layout
> >
> > It's very unlikely that hardware implementations would support this
> > due to extra latency of fetching actual descriptors.
> > 
> > This is a totally new ring layout, and we don't need to worry about the
> > compatibility issues with the old one. So it's better to not include this
> > feature in this new ring layout if we can't find it's necessary now.
> 
> Indirect chaining is actually relatively common for storage devices.
> 
> Hardware implementations are free not to support indirect chaining if it
> hurts latency.
> 

We are proposing removing it if it's not really necessary.
So if it's really necessary, let's just keep it. :)

One problem that keeping this feature may introduce is that, if a
software backend chooses to implement this feature, and a VM is
running on this backend with this feature enabled, it could be a
problem to be live-migrated to a hardware backend which doesn't
support this feature. Yeah, it can be a general problem about live
migration if we have some features be negotiable, and some types
of backend don't plan to support them at all. Do you have any
thoughts on this?

> > 
> > Rx Fixed Buffer Sizes
> > =====================
> > 
> > ## Current proposal
> > 
> > * Driver is free to choose whatever buffer sizes it wishes for Tx and
> >   Rx buffers
> > * Theoretically within a ring, a driver could have different buffer sizes
> > 
> > ## New proposal
> > 
> > * Driver negotiates with device the size of a Rx buffer for a ring
> >     - Each descriptor in that ring will have same size buffer
> >     - Different rings can have different sized buffers
> 
> This makes sense, but it's independent from the packed ring layout.
> 

MST mentioned the related things in the packed-ring-layout proposal:

> From: https://lists.oasis-open.org/archives/virtio-dev/201702/msg00010.html
> * Descriptor length in device descriptors
> ...
> Some devices use identically-sized buffers in all descriptors.
> Ignoring length for driver descriptors there could be an option too.

So we are proposing making the statement more clear.

Best regards,
Tiwei Bie


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]