virtio-comment message

Subject: RE: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout

From: "Dhanoa, Kully" <kully.dhanoa@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>, "Bie, Tiwei" <tiwei.bie@intel.com>
Date: Mon, 11 Sep 2017 06:56:17 +0000

Hi

I've embedded my comments below.

Rgds
Kully

On Thu, Aug 24, 2017 at 07:53:15PM +0800, Tiwei Bie wrote:
> Rx Fixed Buffer Sizes
> =====================
> 
> ## Current proposal
> 
> * Driver is free to choose whatever buffer sizes it wishes for Tx and
>   Rx buffers
> * Theoretically within a ring, a driver could have different buffer 
> sizes
> 
> ## New proposal
> 
> * Driver negotiates with device the size of a Rx buffer for a ring
>     - Each descriptor in that ring will have same size buffer
>     - Different rings can have different sized buffers

What's the motivation for this?  In our testing dynamically sized entries perform better in contrained environments such as the linux kernel where packets are queued at a huge number of independent application sockets.

It seems that device can easily cache the last size to speed up operation.

[Kully]:
	 Device incurs around 1us delay fetching each descriptor. In situations whereby memory is limited on the device and many queues 	are being supported, device would probably fetch descriptors (for Rx) after packets have been received from the network.
	Knowing upfront the buffer size associated with a ring, would allow the device to be able to accurately determine how many 	descriptors are required.

	Yes, agreed that overall system performance is important. Are sockets intended to be used with virtio drivers? If so, would the 	driver not allocate a different queue per socket? The proposal was to have fixed buffer sizes per ring but different rings can have 	different buffer sizes.  Would this comprise of different sizes per ring as opposed to per entry work?

> Data Alignment Boundaries
> =========================
> 
> ## Current proposal
> 
> * Driver is free to choose data buffer alignment to any byte boundary
> 
> ## New proposal
> 
> * Stipulate a fixed alignment for the data buffer

Again motivation seems to be missing. Saving PCI bandwidth isn't going to help if it means driver will then incur more cache misses on access.

[Kully] Would s/w not benefit from buffers which start aligned to a cache line boundary (i.e. 64B)? This would also benefit hardware.

--
MST
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

References:
- Re: [virtio-comment] Hardware friendly proposals from Intel for packed-ring-layout
  - From: "Michael S. Tsirkin" <mst@redhat.com>