virtio-comment message

Subject: Re: [PATCH requirements v4 3/7] net-features: Add low latency receive queue requirements

From: David Edmondson <david.edmondson@oracle.com>
To: Parav Pandit <parav@nvidia.com>
Date: Tue, 15 Aug 2023 09:50:14 +0100

On Tuesday, 2023-08-15 at 10:45:56 +03, Parav Pandit wrote:
> Add requirements for the low latency receive queue.
>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
> changelog:
> v0->v1:
> - clarified the requirements further
> - added line for the gro case
> - added design goals as the motivation for the requirements
> ---
>  net-workstream/features-1.4.md | 45 +++++++++++++++++++++++++++++++++-
>  1 file changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
> index 40fa07f..72d04bd 100644
> --- a/net-workstream/features-1.4.md
> +++ b/net-workstream/features-1.4.md
> @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface.
>  
>  # 2. Summary
>  1. Device counters visible to the driver
> -2. Low latency tx virtqueue for PCI transport
> +2. Low latency tx and rx virtqueues for PCI transport
>  
>  # 3. Requirements
>  ## 3.1 Device counters
> @@ -129,3 +129,46 @@ struct vnet_data_desc desc[2];
>  
>  9. A flow filter virtqueue also similarly need the ability to inline the short flow
>     command header.
> +
> +### 3.2.2 Low latency rx virtqueue
> +0. Design goal:
> +   a. Keep packet metadata and buffer data together which is consumed by driver
> +      layer and make it available in a single cache line of cpu
> +   b. Instead of having per packet descriptors which is complex to scale for

This really is "per buffer" rather than "per packet".

> +      the device, supply the page directly to the device to consume it based
> +      on packet size
> +1. The device should be able to write a packet receive completion that consists
> +   of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
> +   PCIe TLP.
> +2. The device should be able to perform DMA writes of multiple packets
> +   completions in a single DMA transaction up to the PCIe maximum write limit
> +   in a transaction.
> +3. The device should be able to zero pad packet write completion to align it to
> +   64B or CPU cache line size whenever possible.
> +4. An example of the above DMA completion structure:
> +
> +```
> +/* Constant size receive packet completion */
> +struct vnet_rx_completion {
> +   u16 flags;
> +   u16 id; /* buffer id */
> +   u8 gso_type;
> +   u8 reserved[3];
> +   le16 gso_hdr_len;
> +   le16 gso_size;
> +   le16 csum_start;
> +   le16 csum_offset;
> +   u16 reserved2;
> +   u64 timestamp; /* explained later */
> +   u8 padding[];
> +};
> +```
> +5. The driver should be able to post constant-size buffer pages on a receive
> +   queue which can be consumed by the device for an incoming packet of any size
> +   from 64B to 9K bytes.
> +6. The device should be able to know the constant buffer size at receive
> +   virtqueue level instead of per buffer level.
> +7. The device should be able to indicate when a full page buffer is consumed,
> +   which can be recycled by the driver when the packets from the completed
> +   page is fully consumed.
> +8. The device should be able to consume multiple pages for a receive GSO stream.
-- 
Swimming around in a plastic bag.

References:
- [PATCH requirements v4 0/7] virtio net requirements for 1.4
  - From: Parav Pandit <parav@nvidia.com>
- [PATCH requirements v4 3/7] net-features: Add low latency receive queue requirements
  - From: Parav Pandit <parav@nvidia.com>