virtio-comment message

Subject: RE: [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements

From: Parav Pandit <parav@nvidia.com>
To: David Edmondson <david.edmondson@oracle.com>
Date: Tue, 22 Aug 2023 06:12:05 +0000

> From: David Edmondson <david.edmondson@oracle.com>
> Sent: Monday, August 21, 2023 4:17 PM

> > +### 3.2.2 Low latency rx virtqueue
> > +0. Design goal:
> > +   a. Keep packet metadata and buffer data together which is consumed by
> driver
> > +      layer and make it available in a single cache line of cpu
> 
> Phrased like this, it seems to run counter to the "header data split"
> requirement.
> 
Mostly not. Currently, the packet metadata consumed by the driver is spread in two different DMAs at two different addresses.
For split q: virtio_net_hdr + used ring
For split q: virtio_net_hdr + desc.

Instead, both to complete in single PCIe DMA and also read in single cache line from the cpu while processing it.

> Is there an implicit guard that this only applies for very small payloads?
> 
No.
All packet sizes benefit from it.

> > +   b. Instead of having per packet descriptors which is complex to scale for
> > +      the device, supply the page directly to the device to consume it based
> > +      on packet size
> > +1. The device should be able to write a packet receive completion that
> consists
> > +   of struct virtio_net_hdr (or similar) and a buffer id using a single DMA
> write
> > +   PCIe TLP.
> > +2. The device should be able to perform DMA writes of multiple packets
> > +   completions in a single DMA transaction up to the PCIe maximum write
> limit
> > +   in a transaction.
> > +3. The device should be able to zero pad packet write completion to align it
> to
> > +   64B or CPU cache line size whenever possible.
> > +4. An example of the above DMA completion structure:
> > +
> > +```
> > +/* Constant size receive packet completion */ struct
> > +vnet_rx_completion {
> > +   u16 flags;
> > +   u16 id; /* buffer id */
> > +   u8 gso_type;
> > +   u8 reserved[3];
> > +   le16 gso_hdr_len;
> > +   le16 gso_size;
> > +   le16 csum_start;
> > +   le16 csum_offset;
> > +   u16 reserved2;
> > +   u64 timestamp; /* explained later */
> > +   u8 padding[];
> > +};
> > +```
> > +5. The driver should be able to post constant-size buffer pages on a receive
> > +   queue which can be consumed by the device for an incoming packet of any
> size
> > +   from 64B to 9K bytes.
> > +6. The device should be able to know the constant buffer size at receive
> > +   virtqueue level instead of per buffer level.
> > +7. The device should be able to indicate when a full page buffer is consumed,
> > +   which can be recycled by the driver when the packets from the completed
> > +   page is fully consumed.
> > +8. The device should be able to consume multiple pages for a receive GSO
> stream.
> --
> Modern people tend to dance.

References:
- [PATCH requirements v5 0/7] virtio net requirements for 1.4
  - From: Parav Pandit <parav@nvidia.com>
- [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH requirements v5 3/7] net-features: Add low latency receive queue requirements
  - From: David Edmondson <david.edmondson@oracle.com>