virtio message

Subject: Re: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

From: David Edmondson <david.edmondson@oracle.com>
To: Parav Pandit <parav@nvidia.com>
Date: Mon, 14 Aug 2023 12:54:47 +0100

On Monday, 2023-07-24 at 06:34:17 +03, Parav Pandit wrote:
> Add requirements for the low latency receive queue.
>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
> changelog:
> v0->v1:
> - clarified the requirements further
> - added line for the gro case
> - added design goals as the motivation for the requirements
> ---
>  net-workstream/features-1.4.md | 45 +++++++++++++++++++++++++++++++++-
>  1 file changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/net-workstream/features-1.4.md b/net-workstream/features-1.4.md
> index eb95592..e04727a 100644
> --- a/net-workstream/features-1.4.md
> +++ b/net-workstream/features-1.4.md
> @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface.
>
>  # 2. Summary
>  1. Device counters visible to the driver
> -2. Low latency tx virtqueue for PCI transport
> +2. Low latency tx and rx virtqueues for PCI transport
>
>  # 3. Requirements
>  ## 3.1 Device counters
> @@ -121,3 +121,46 @@ struct vnet_data_desc desc[2];
>
>  9. A flow filter virtqueue also similarly need the ability to inline the short flow
>     command header.
> +
> +### 3.2.2 Low latency rx virtqueue
> +0. Design goal:
> +   a. Keep packet metadata and buffer data together which is consumed by driver
> +      layer and make it available in a single cache line of cpu
> +   b. Instead of having per packet descriptors which is complex to scale for
> +      the device, supply the page directly to the device to consume it based
> +      on packet size

Really "per packet descriptor buffers"?

> +1. The device should be able to write a packet receive completion that consists
> +   of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
> +   PCIe TLP.
> +2. The device should be able to perform DMA writes of multiple packets
> +   completions in a single DMA transaction up to the PCIe maximum write limit
> +   in a transaction.
> +3. The device should be able to zero pad packet write completion to align it to
> +   64B or CPU cache line size whenever possible.
> +4. An example of the above DMA completion structure:
> +
> +```
> +/* Constant size receive packet completion */
> +struct vnet_rx_completion {
> +   u16 flags;
> +   u16 id; /* buffer id */
> +   u8 gso_type;
> +   u8 reserved[3];
> +   le16 gso_hdr_len;
> +   le16 gso_size;
> +   le16 csum_start;
> +   le16 csum_offset;
> +   u16 reserved2;
> +   u64 timestamp; /* explained later */
> +   u8 padding[];
> +};
> +```
> +5. The driver should be able to post constant-size buffer pages on a receive
> +   queue which can be consumed by the device for an incoming packet of any size
> +   from 64B to 9K bytes.
> +6. The device should be able to know the constant buffer size at receive
> +   virtqueue level instead of per buffer level.
> +7. The device should be able to indicate when a full page buffer is consumed,
> +   which can be recycled by the driver when the packets from the completed
> +   page is fully consumed.

s/is full consumed/are fully consumed/

> +8. The device should be able to consume multiple pages for a receive GSO stream.
-- 
So tap at my window, maybe I might let you in.

Follow-Ups:
- RE: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements
  - From: Parav Pandit <parav@nvidia.com>