virtio-comment message

Subject: Re: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Date: Tue, 6 Jun 2023 19:03:19 -0400

On Tue, Jun 06, 2023 at 10:44:04PM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, June 6, 2023 6:33 PM
> > 
> > On Fri, Jun 02, 2023 at 01:03:01AM +0300, Parav Pandit wrote:
> > > Add requirements for the low latency receive queue.
> > >
> > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > ---
> > >  net-workstream/features-1.4.md | 38
> > > +++++++++++++++++++++++++++++++++-
> > >  1 file changed, 37 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net-workstream/features-1.4.md
> > > b/net-workstream/features-1.4.md index 55f1b1f..054f951 100644
> > > --- a/net-workstream/features-1.4.md
> > > +++ b/net-workstream/features-1.4.md
> > > @@ -7,7 +7,7 @@ together is desired while updating the virtio net interface.
> > >
> > >  # 2. Summary
> > >  1. Device counters visible to the driver -2. Low latency tx virtqueue
> > > for PCI transport
> > > +2. Low latency tx and rx virtqueues for PCI transport
> > >
> > >  # 3. Requirements
> > >  ## 3.1 Device counters
> > > @@ -107,3 +107,39 @@ struct vnet_data_desc desc[2];
> > >
> > >  7. Ability to place all transmit completion together with it per packet stream
> > >     transmit timestamp using single PCIe transcation.
> > > +
> > > +### 3.2.2 Low latency rx virtqueue
> > > +1. The device should be able to write a packet receive completion that
> > consists
> > > +   of struct virtio_net_hdr (or similar) and a buffer id using a single DMA write
> > > +   PCIe TLP.
> > 
> > why? what is wrong with it being linear with packet instead?
> >
> It prohibits header data split,
> it requires multiple DMAs for the metadata consumed by single driver layer.
> Data processed by one layer is placed in two different locations.
> This hurts performance.

only with split yes? refer to that then.

maybe combine with the header split part?
these two seem intervined.


>  
> > > +2. The device should be able to perform DMA writes of multiple packets
> > > +   completions in a single DMA transaction up to the PCIe maximum write
> > limit
> > > +   in a transaction.
> > > +3. The device should be able to zero pad packet write completion to align it
> > to
> > > +   64B or CPU cache line size whenever possible.
> > 
> > assuming completion is used buffer, these are eactly 64 bytes with packed vq,
> > and they are linear so can be written in one transaction.
> When a packet spans multiple buffers, they cannot be written in contiguous manner.

oh you mean when you manage to stick packet itself in the queue?
this idea is only mentioned down the road.


> > if so why list requirements which are already met?
> > if you want them for completeness mention this.
> > 
> > > +4. An example of the above DMA completion structure:
> > > +
> > > +```
> > > +/* Constant size receive packet completion */ struct
> > > +vnet_rx_completion {
> > > +   u16 flags;
> > > +   u16 id; /* buffer id */
> > > +   u8 gso_type;
> > > +   u8 reserved[3];
> > > +   le16 gso_hdr_len;
> > > +   le16 gso_size;
> > > +   le16 csum_start;
> > > +   le16 csum_offset;
> > > +   u16 reserved2;
> > > +   u64 timestamp; /* explained later */
> > > +   u8 padding[];
> > > +};
> > > +```
> > > +5. The driver should be able to post constant-size buffer pages on a receive
> > > +   queue which can be consumed by the device for an incoming packet of any
> > size
> > > +   from 64B to 9K bytes.
> > 
> > possible with mrg buffers
> > 
> It doesn't scale to post constant size 64B buffer.

maybe - figure out the actual requirement then.


> > > +6. The device should be able to know the constant buffer size at receive
> > > +   virtqueue level instead of per buffer level.
> > 
> > the bigger question is not communicating to device. that is trivial.
> > the bigger question is that linux IP stack seems to benefit from variable sized
> > packets because buffers waste precious kernel memory.
> Want to avoid the wastage and still achieve constant size posting.

ok mention this maybe.

> > is this for non IP stack such as xdp? non-linux guests? dpdk perhaps?
> > 
> For Linux kernel guest.

constant size buffers will inevitably waste memory i don't see
a way around that.

> > > +7. The device should be able to indicate when a full page buffer is consumed,
> > > +   which can be recycled by the driver when the packets from the completed
> > > +   page is fully consumed.
> > 
> > no idea what this means.
> >
>  Instead of driver nearly allocating a page and splitting into buffer nearly of same size, an approach to post the page and let device to consume it based on packet size.

oh I see. interesting. not easy as this does not fit the
buffer abstraction.

-- 
MST

References:
- [PATCH requirements 0/7] virtio net new features requirements
  - From: Parav Pandit <parav@nvidia.com>
- [PATCH requirements 3/7] net-features: Add low latency receive queue requirements
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] [PATCH requirements 3/7] net-features: Add low latency receive queue requirements
  - From: Parav Pandit <parav@nvidia.com>