OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout


Hi Steven
I will try to explain from a hardware perspective why from a guest perspective having a headpointer or using DESC_HW flag can be very similar.

The hardware will maintain a head pointer (local to it) for each queue and use this along with the tail pointer to determine how many valid descriptors are in the ring.

Potentially (not proposed) the hardware could write the head pointer value into guest memory. As you mentioned, the guest could use this along with the tail pointer to determine how many descriptors have been consumed by the hardware.

If instead of writing the head pointer value into guest memory, the hardware writes DESC_HW flag for a descriptor, then this can achieve the same result. If the hardware writes back every descriptor's DESC_HW flag, then yes this can be inefficient.  The DESC_WB flag indicates which descriptors need to be written back to guest memory (as a minimum, only the DESC_HW flag needs to be written). This ensures (as with a head pointer writeback) that hardware is only writing back a single descriptor's DESC_HW flag after a batch of n descriptors have been consumed.

The guest (as it set the DESC_WB flags) would know that it only needs to poll every nth (8/16?) descriptor's HW_flag and if it is clear then all n-1 previous descriptors have also been consumed by the hardware.

So given that both head pointer writeback or DESC_HW flag writeback allow the same functionality, why chose the latter?
Well in many cases the descriptor may have to be written back with extra information (packet length) and so if this mechanism has to be supported then we may as well use the DESC_HW flag as well.

Admittedly, currently most packet metadata is prepended to the packet buffer.

Also, to allow out-of-order processing, the descriptors would have to be written back (once consumed by hardware) to the guest memory.


Rgds
Kully
-----Original Message-----
From: Michael S. Tsirkin [mailto:mst@redhat.com] 
Sent: Thursday, August 31, 2017 8:50 PM
To: Steven Luong (sluong) <sluong@cisco.com>
Cc: Bie, Tiwei <tiwei.bie@intel.com>; virtio-comment@lists.oasis-open.org; Dhanoa, Kully <kully.dhanoa@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Gray, Mark D <mark.d.gray@intel.com>
Subject: Re: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout

On Thu, Aug 31, 2017 at 06:51:09PM +0000, Steven Luong (sluong) wrote:
> I have a naïve question. Why do we need to invent two flags per descriptor, DESC_HW and DESC_WB? Why can’t we keep things simple like a circular queue. There is a tail pointer and a head pointer for the ring. The producer manages the tail pointer while the consumer manages the head pointer. From the two pointers, the consumer knows exactly how many descriptors are available to read and the producer also knows exactly how many more slots that it has so that it can continue to write. There is no need to test each descriptor before the write and no need to test each descriptor prior to the read.

Looks like you are reinventing virtio 1.0. Head/tail pointers cause a lot of cache line bounces.

> You already advocate the tail pointer, but fall short of introducing the head pointer. What did I miss?
> 
> Steven

Some considerations going into the design can be seen here:
https://www.youtube.com/watch?v=5QIE0F7nU3U

> On 8/24/17, 4:53 AM, "virtio-comment@lists.oasis-open.org on behalf of Tiwei Bie" <virtio-comment@lists.oasis-open.org on behalf of tiwei.bie@intel.com> wrote:
> 
>     Hi all,
>     
>     Based on the packed-ring-layout proposal posted here:
>     
>     
> https://lists.oasis-open.org/archives/virtio-dev/201702/msg00010.html
>     
>     We have below proposals to make it more hardware friendly.
>     
>     Driver Signaling Available Descriptors
>     ======================================
>     
>     ## Current proposal
>     
>     * Each descriptor has 1 bit flag DESC_HW
>     * Driver creates descriptors and then sets DESC_HW flag
>     * Device reads descriptors and can use it if DESC_HW is set
>     
>     ## New proposal
>     
>     * In addition to the DESC_HW flag, each virtio queue has a tail pointer
>         - Driver creates suitable (i.e. multiple of cacheline) descriptors,
>           then performs MMIO write to tail pointer.
>     * For each virtio queue, there is a head pointer lives in device and
>       not used by driver
>         - Device compares tail pointer with head pointer to determine exactly
>           how many new descriptors have been added to a specific queue
>     * The descriptors in [head, tail) are available to device
>     * The DESC_HW flag will be kept for device signaling used 
> descriptors
>     
>     Device Signaling Used Descriptors
>     =================================
>     
>     ## Current proposal
>     
>     * Device clears each descriptor's DESC_HW flag (1 bit) after it has
>       finished with the descriptor
>     
>     ## New proposal
>     
>     * Device does not need to clear DESC_HW flag for every descriptor
>     * Driver controls which descriptors need to have their DESC_HW cleared:
>         - Descriptor has an extra 1 bit flag, DESC_WB (Write-Back):
>             * w/ DESC_WB set  => Device must write-back this descriptor
>                                  after use. At the minimum, clear the
>                                  DESC_HW flag.
>             * w/o DESC_WB set => Device doesn't need to write-back the
>                                  descriptor.
>     
>     This proposal saves PCIe bandwidth:
>     
>     In many scenarios, descriptor data doesn't need to be written back,
>     i.e. for network devices, the packet metadata is prepended to packet
>     data.
>     
>     An alternative would be to add a field with a number of used descriptors.
>     This would give the same benefit but would use more bits in the descriptor.
>     
>     Indirect Chaining
>     =================
>     
>     ## Current proposal
>     
>     * Indirect chaining is an optional feature
>     
>     ## New proposal
>     
>     * Remove this feature from this new ring layout
>     
>     It's very unlikely that hardware implementations would support this
>     due to extra latency of fetching actual descriptors.
>     
>     This is a totally new ring layout, and we don't need to worry about the
>     compatibility issues with the old one. So it's better to not include this
>     feature in this new ring layout if we can't find it's necessary now.
>     
>     Rx Fixed Buffer Sizes
>     =====================
>     
>     ## Current proposal
>     
>     * Driver is free to choose whatever buffer sizes it wishes for Tx and
>       Rx buffers
>     * Theoretically within a ring, a driver could have different 
> buffer sizes
>     
>     ## New proposal
>     
>     * Driver negotiates with device the size of a Rx buffer for a ring
>         - Each descriptor in that ring will have same size buffer
>         - Different rings can have different sized buffers
>     
>     Data Alignment Boundaries
>     =========================
>     
>     ## Current proposal
>     
>     * Driver is free to choose data buffer alignment to any byte 
> boundary
>     
>     ## New proposal
>     
>     * Stipulate a fixed alignment for the data buffer
>     
>     ----------------------------------------------------------------
>     
>     We have done a basic prototype for the packed-ring-layout in DPDK
>     based on the v2 packed-ring-layout proposal [1].
>     
>     The prototype has been sent to the DPDK mailing list as RFC [2][3].
>     And I also collected those public patches into my github repo [4]
>     to help others be able to try it easily.
>     
>     Besides the v2 packed-ring-layout proposal posted on the mailing list.
>     This prototype also includes the proposal that introduces the DESC_WB
>     flag to make it possible to let the driver tell the device just update
>     the specified descriptors. You can find more details in this patch [5].
>     And we don't see the performance regression in software implementation:
>     
>     64bytes iofwd loopback:
>                        5c'virtio-1c'vhost     1c'virtio-5c'vhost
>     virtio1.0          7.655Mpps              11.48Mpps
>     virtio1.1 A        8.757Mpps              11.70Mpps
>     virtio1.1 B        8.910Mpps              11.66Mpps
>     The columns:
>     5c'virtio-1c'vhost - use 5 cores to run testpmd/virtio-user and
>                          use 1 core to run testpmd/vhost-pmd (shows
>                          vhost performance)
>     1c'virtio-5c'vhost - use 1 core to run testpmd/virtio-user and
>                          use 5 cores to run testpmd/vhost-pmd (shows
>                          virtio performance)
>     The rows:
>     virtio1.0    - The current (simplified) virtio/vhost implementation in DPDK
>     virtio1.1 A  - The prototype based on the v2 packed-ring-layout proposal
>     virtio1.1 B  - Introduce DESC_WB, and adopt it on the Tx path
>     
>     [1] https://lists.oasis-open.org/archives/virtio-dev/201702/msg00010.html
>     [2] http://dpdk.org/ml/archives/dev/2017-June/068315.html
>     [3] http://dpdk.org/ml/archives/dev/2017-July/071562.html
>     [4] https://github.com/btw616/dpdk-virtio1.1
>     [5] http://dpdk.org/ml/archives/dev/2017-July/071568.html
>     
>     Best regards,
>     Tiwei Bie
>     
>     This publicly archived list offers a means to provide input to the
>     
>     OASIS Virtual I/O Device (VIRTIO) TC.
>     
>     
>     
>     In order to verify user consent to the Feedback License terms and
>     
>     to minimize spam in the list archive, subscription is required
>     
>     before posting.
>     
>     
>     
>     Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>     
>     Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>     
>     List help: virtio-comment-help@lists.oasis-open.org
>     
>     List archive: 
> https://lists.oasis-open.org/archives/virtio-comment/
>     
>     Feedback License: 
> https://www.oasis-open.org/who/ipr/feedback_license.pdf
>     
>     List Guidelines: 
> https://www.oasis-open.org/policies-guidelines/mailing-lists
>     
>     Committee: https://www.oasis-open.org/committees/virtio/
>     
>     Join OASIS: https://www.oasis-open.org/join/
>     
>     
> 
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]