OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] packed ring layout proposal v2


On Thu, Feb 09, 2017 at 05:11:05PM +0100, Cornelia Huck wrote:
> > >>> * Non power-of-2 ring sizes
> > >>>
> > >>> As the ring simply wraps around, there's no reason to
> > >>> require ring size to be power of two.
> > >>> It can be made a separate feature though.
> > >>
> > >> Power of 2 ring sizes are required in order to ignore the high bits of
> > >> the indices.  With non-power-of-2 sizes you are forced to keep the
> > >> indices less than the ring size.
> > > 
> > > Right. So
> > > 
> > > 	if (unlikely(idx++ > size))
> > > 		idx = 0;
> > > 
> > > OTOH ring size that's twice larger than necessary
> > > because of power of two requirements wastes cache.
> > 
> > I don't know.  Power of 2 ring size is pretty standard, I'd rather avoid
> > the complication and the gratuitous difference with 1.0.
> 
> I agree. I don't think dropping the power of 2 requirement buys us so
> much that it makes up for the added complexity.

I recalled why I came up with this. The issue is cache associativity.
Recall that besides the ring we have event suppression
structures - if we are lucky and things run at the same speed
everything can work by polling keeping events disabled, then
event suppression structures are never written to, they are read-only.

However if ring and event suppression share a cache line ring accesses
have a chance to push the event suppression out of cache, causing
misses on read.

This can happen if they are at the same offset in the set.
E.g. with L1 cache 4Kbyte sets are common, so same offset
within a 4K page.

We can fix this by making event suppression adjacent in memory, e.g.:


[interrupt suppress]
[descriptor ring]
[kick suppress]

If this whole structure fits in a single set, ring accesses will
not push kick or interrupt suppress out of cache.
Specific layout can be left for drivers, but as set size is
a power of two this might require a non-power of two ring size.

I conclude that this is an optimization that needs to be
benchmarked.

I also note that the generic description does not have to force
powers of two *even if devices actually require it*.
I would be inclined to word the text in a way that makes
relaxing the restriction easier.

For example, we can say "free running 16 bit index" and this forces a
power of two, but we can also say "free running index wrapping to 0
after (N*queue-size - 1) with N chosen such that the value fits in 16
bit" and this is exactly the same if queue size is a power of 2.

So we can add text saying "ring size MUST be a power of two"
and later it will be easy to relax just by adding a feature bit.



-- 
MST


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]