OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio-dev] RFC: Doorbell suppression, packed-ring mode and hardware offload

On Tue, Feb 12, 2019 at 1:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
On Fri, Feb 01, 2019 at 09:43:02AM -0800, Rob Miller wrote:
> Agreed that this is needed.
> I would also like to suggest splitting the F_IN_ORDER into
> F_RX_IN_ORDER and F_TX_IN_ORDER to support hw LRO implementations,
> which can be more of a scatter/gather than tx. This would allow
> batchmode for tx at least in packed rings.

I'm not sure what does this buy us. Are you interested in
out of order tx with in order rx then?
Other way around. It is easier to guarantee that TX pkts are processed in order but rx is quite more difficult. The way some rx aggregators work is thatÂÂas pkts are received from the wire, a "timer" and flow detector is setup. This allows for other incoming packet with same TCP header (flow) to be egg'd into one large packet with multiple segments. When a timeout fires (no other segments arrive within time) or the TCP header changes, the whole list of RX buffers is then sent up to the driver for processing. However during this gathering period, multiple flows, say from other TCP sources could be arriving. The hardware allocates rx buffers(descriptors) as the fragments hit the receiver, hence they will be used out-of-order and batch mode on RX isn't possible.

> Finally, i would suggest a means to specify a given rings ring mode
> and packed leans more towards TX, whilst split can be either really
> depending upon LRO, jumbo, rx buff size, ect.. just like F_IN_ORDER,
> we can have RX & TX, split out.
> Sent from my iPhone

Before we jump there a bit more justification would be nice.
The above description of rx aggregation, where buffer are allocated for each rx pkt segment as they hit the receiver . If there were 2 ingressing flows from different sources, say nicely interleaved, then i would expect to see rx desc usage as follows:

buffer 0 - flow 0
buffer 1- flow 1
buffer 2 - flow 0
buffer 3 - flow 1

I would think this lends itself to a split virtqueue mode of operation better than ring buffer.

E.g. doing this change in software isn't a lot of work. How about
a software patch with some performance gains measured?
Failing that, some back of the napkin calculations showing
the potential gains and costs?
Yea I could do that. I wanted to bounce the idea around a bit first to understand more if there are holes in my logic.Â

> > On Feb 1, 2019, at 9:23 AM, David Riddoch <driddoch@solarflare.com> wrote:
> >
> > All,
> >
> > I'd like to propose a small extension to the packed virtqueue mode. My
> > proposal is to add an offset/wrap field, written by the driver,
> > indicating how many available descriptors have been added to the ring.
> >
> > The reason for wanting this is to improve performance of hardware
> > devices. Because of high read latency over a PCIe bus, it is important
> > for hardware devices to read multiple ring entries in parallel. It is
> > desirable to know how many descriptors are available prior to issuing
> > these reads, else you risk fetching descriptors that are not yet
> > available. As well as wasting bus bandwidth this adds complexity.
> >
> > I'd previously hoped that VIRTIO_F_NOTIFICATION_DATA would solve this
> > problem, but we still have a problem. If you rely on doorbells to tell
> > you how many descriptors are available, then you have to keep doorbells
> > enabled at all times. This can result in a very high rate of doorbells
> > with some drivers, which can become a severe bottleneck (because x86
> > CPUs can't emit MMIOs at very high rates).
> >
> > The proposed offset/wrap field allows devices to disable doorbells when
> > appropriate, and determine the latest fill level via a PCIe read.
> >
> > I suggest the best place to put this would be in the driver area,
> > immediately after the event suppression structure.
> >
> > Presumably we would like this to be an optional feature, as
> > implementations of packed mode already exist in the wild. How about
> >
> > If I prepare a patch to the spec is there still time to get this into v1.1?
> >
> > Best,
> > David
> >
> > --
> > David Riddoch <driddoch@solarflare.com> -- Chief Architect, Solarflare
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]