virtio-comment message

Subject: Re: [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

From: David Edmondson <david.edmondson@oracle.com>
To: Parav Pandit <parav@nvidia.com>
Date: Mon, 21 Aug 2023 11:47:04 +0100

On Friday, 2023-08-18 at 07:35:53 +03, Parav Pandit wrote:
> Add requirements for the low latency receive queue.
>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
> changelog:
> v0->v1:
> - clarified the requirements further
> - added line for the gro case
> - added design goals as the motivation for the requirements
> ---
>  net-workstream/features-1.4.md | 45 +++++++++++++++++++++++++++++++++-
>  1 file changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
> index 1167ce2..bc9e971 100644
> --- a/net-workstream/features-1.4.md
> +++ b/net-workstream/features-1.4.md
> @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface.
>  
>  # 2. Summary
>  1. Device counters visible to the driver
> -2. Low latency tx virtqueue for PCI transport
> +2. Low latency tx and rx virtqueues for PCI transport
>  
>  # 3. Requirements
>  ## 3.1 Device counters
> @@ -127,3 +127,46 @@ struct vnet_data_desc desc[2];
>  
>  9. A flow filter virtqueue also similarly need the ability to inline the short flow
>     command header.
> +
> +### 3.2.2 Low latency rx virtqueue
> +0. Design goal:
> +   a. Keep packet metadata and buffer data together which is consumed by driver
> +      layer and make it available in a single cache line of cpu

Phrased like this, it seems to run counter to the "header data split"
requirement.

Is there an implicit guard that this only applies for very small payloads?

> +   b. Instead of having per packet descriptors which is complex to scale for
> +      the device, supply the page directly to the device to consume it based
> +      on packet size
> +1. The device should be able to write a packet receive completion that consists
> +   of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
> +   PCIe TLP.
> +2. The device should be able to perform DMA writes of multiple packets
> +   completions in a single DMA transaction up to the PCIe maximum write limit
> +   in a transaction.
> +3. The device should be able to zero pad packet write completion to align it to
> +   64B or CPU cache line size whenever possible.
> +4. An example of the above DMA completion structure:
> +
> +```
> +/* Constant size receive packet completion */
> +struct vnet_rx_completion {
> +   u16 flags;
> +   u16 id; /* buffer id */
> +   u8 gso_type;
> +   u8 reserved[3];
> +   le16 gso_hdr_len;
> +   le16 gso_size;
> +   le16 csum_start;
> +   le16 csum_offset;
> +   u16 reserved2;
> +   u64 timestamp; /* explained later */
> +   u8 padding[];
> +};
> +```
> +5. The driver should be able to post constant-size buffer pages on a receive
> +   queue which can be consumed by the device for an incoming packet of any size
> +   from 64B to 9K bytes.
> +6. The device should be able to know the constant buffer size at receive
> +   virtqueue level instead of per buffer level.
> +7. The device should be able to indicate when a full page buffer is consumed,
> +   which can be recycled by the driver when the packets from the completed
> +   page is fully consumed.
> +8. The device should be able to consume multiple pages for a receive GSO stream.
-- 
Modern people tend to dance.

Follow-Ups:
- RE: [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements
  - From: Parav Pandit <parav@nvidia.com>

References:
- [PATCH requirements v5 0/7] virtio net requirements for 1.4
  - From: Parav Pandit <parav@nvidia.com>
- [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements
  - From: Parav Pandit <parav@nvidia.com>