virtio-comment message

Subject: Re: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout

From: Paolo Bonzini <pbonzini@redhat.com>
To: Tiwei Bie <tiwei.bie@intel.com>, virtio-comment@lists.oasis-open.org, mst@redhat.com
Date: Thu, 24 Aug 2017 14:10:34 +0200

On 24/08/2017 13:53, Tiwei Bie wrote:
> 
> * In addition to the DESC_HW flag, each virtio queue has a tail pointer
>     - Driver creates suitable (i.e. multiple of cacheline) descriptors,
>       then performs MMIO write to tail pointer.

If I understand correctly, the tail pointer is the value that is written
to the MMIO register.  If that is the case, this is unfortunately bad
for virtualization.  Virt prefers a doorbell register where the value
doesn't matter.  This is because:

1) the value is not available directly and computing it requires
instruction decoding, which in turn requires walking page tables

2) if the value doesn't matter, the hypervisor can simply wake up a
userspace thread that processes the virtio queue without bothering to
pass the value.

On the other hand, writing a tail pointer _before_ the MMIO write may
cost a cache miss.  Hence the packed ring layout proposal replaced the
tail pointer write with lookahead on the ring buffer's DESC_HW flags.
The idea is that lookahead is cheaper, because hopefully the first
non-DESC_HW buffer will be in the same cache line as the last DESC_HW
buffer.

> Indirect Chaining
> =================
> 
> ## Current proposal
> 
> * Indirect chaining is an optional feature
> 
> ## New proposal
> 
> * Remove this feature from this new ring layout
>
> It's very unlikely that hardware implementations would support this
> due to extra latency of fetching actual descriptors.
> 
> This is a totally new ring layout, and we don't need to worry about the
> compatibility issues with the old one. So it's better to not include this
> feature in this new ring layout if we can't find it's necessary now.

Indirect chaining is actually relatively common for storage devices.

Hardware implementations are free not to support indirect chaining if it
hurts latency.

> 
> Rx Fixed Buffer Sizes
> =====================
> 
> ## Current proposal
> 
> * Driver is free to choose whatever buffer sizes it wishes for Tx and
>   Rx buffers
> * Theoretically within a ring, a driver could have different buffer sizes
> 
> ## New proposal
> 
> * Driver negotiates with device the size of a Rx buffer for a ring
>     - Each descriptor in that ring will have same size buffer
>     - Different rings can have different sized buffers

This makes sense, but it's independent from the packed ring layout.

Paolo

Follow-Ups:
- Re: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout
  - From: Tiwei Bie <tiwei.bie@intel.com>

References:
- Hardware friendly proposals from Intel for packed-ring-layout
  - From: Tiwei Bie <tiwei.bie@intel.com>