OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] packed ring layout proposal


On Mon, Sep 19, 2016 at 07:37:53AM +0000, Pierre Pfister (ppfister) wrote:
> Hello Michael, 
> 
> Thanks for doing this. ;-)
> 
> 
> > Le 16 sept. 2016 à 00:39, Michael S. Tsirkin <mst@redhat.com> a écrit :
> > 
> > 
> > Let's start a discussion around an alternative ring layout.
> > This has been in my kvm forum 2016 presentation.
> > The idea is to have a r/w descriptor in a ring structure,
> > replacing the used and available ring, index and descriptor
> > buffer.
> > 
> > * Descriptor ring:
> > 
> > Guest adds descriptors with unique index values and DESC_HW set in flags.
> > Host overwrites used descriptors with correct len, index, and DESC_HW
> > clear.  Flags are always set/cleared last.
> > 
> > #define DESC_HW 0x0080
> > 
> > struct desc {
> >        __le64 addr;
> >        __le32 len;
> >        __le16 index;
> >        __le16 flags;
> > };
> > 
> > When DESC_HW is set, descriptor belongs to device. When it is clear,
> > it belongs to the driver.
> 
> I think the simplified ring layout is a good idea. 
> It reduces the number of indirections, and will probably help pre-fetching data in order to avoid cache misses.
> 
> I am not completely convinced by the flag-based approach instead of the index-based approach though.
> With the index-based approach, you have only one-read to do in order
> to know up to what point in the queue you can go.
> With this flag, every-time you look at a new descriptor, before doing anything you will have to look at the flags.
> Meaning that you will have to wait for the result of a test on a very newly fetched piece of memory.
> 
> So maybe it would be interesting to still have a used_index and available_index (put in different cache lines).

Problem is, index access is guaranteed to be a cache miss.
You need to stall waiting for descriptor length anyway -
go ahead and implement an alternative, and prove me wrong,
but my testing shows removing the index speeds things up.

Find the prototype here:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/virtio/ringtest/ring.c



> 
> > 
> > 
> > * Scatter/gather support
> > 
> > We can use 3 bits to set direction
> > and to chain s/g entries in a request, same as virtio 1.0:
> > 
> > /* This marks a buffer as continuing via the next field. */
> > #define VRING_DESC_F_NEXT       1
> > /* This marks a buffer as write-only (otherwise read-only). */
> > #define VRING_DESC_F_WRITE      2
> > 
> > Unlike virtio 1.0, all descriptors must have distinct ID
> > values.
> > 
> > * Indirect buffers
> > 
> > Can be done like in virtio 1.0:
> > 
> > /* This means the buffer contains a list of buffer descriptors. */
> > #define VRING_DESC_F_INDIRECT   4
> > 
> > In the descriptors in the indirect buffers, I think we should drop index
> > field altogether, just put s/g entries one after the other.
> > 
> > Also, length in the indirect descriptor should mark
> > the list of the chain.
> > 
> > virtio 1.0 seems to allow a s/g entry followed by
> > an indirect descriptor. This does not seem useful,
> > so let's not allow that anymore.
> 
> First of all, if I understand correctly, VRING_DESC_F_NEXT doesn't use a 'next' field, and rather just
> say that the next descriptor in the queue is part of the same message. I think this is really, really, good.
> 
> Paolo proposed in a later mail to drop VRING_DESC_F_NEXT altogether, and I think
> he is right at least in the sense that having two solutions for doing almost the same thing 
> is not necessary (IMHO, even harmful).

I'd like to see some code and numbers to back this up.  In particular,
testing dpdk performance with/without indirect descriptors would be
interesting.
If the optimal things depends on workload or the type of device in use,
then it might be necessary.

> But, VRING_DESC_F_INDIRECT adds a layer of indirection and makes prefetching a bit
> more complicated.
> The reason why VRING_DESC_F_INDIRECT was useful in the first place is that virtio 1.0 
> didn't have enough descriptors in the main queue (it is equal to the queue size).

rings are physically contigious, and allocating large physically
contigious buffers is problematic for many guests.



> Therefore, ideally, I'd rather have a longer queue and use VRING_DESC_F_NEXT only and drop
> VRING_DESC_F_INDIRECT altogether.

One can always just avoid implementing VRING_DESC_F_INDIRECT if desired.

> And if delay spent in the queue is an issue. I think it's driver's job to avoid bufferbloat if necessary 
> by adapting the number of descriptors is allows the device to use.
>
> > 
> > 
> > * Batching descriptors:
> > 
> > virtio 1.0 allows passing a batch of descriptors in both directions, by
> > incrementing the used/avail index by values > 1.  We can support this by
> > tagging first/middle/last descriptors in the flags field. For
> > comptibility, polarity is reversed so bit is set for non-first and
> > non-last descriptors in a batch:
> > 
> > #define BATCH_NOT_FIRST 0x0010
> > #define BATCH_NOT_LAST  0x0020
> > 
> > In other words only descriptor in batch is 0000.
> > 
> > Batch does not have to be processed together, so
> > !first/!last flags can be changed when descriptor is used.
> > 
> 
> I don't exactly understand why this is necessary.
> Is there any added value by knowing that a bunch of messages have been written at 
> the same time.

virtio net mergeable buffers pretty much rely on this.


> As mentioned above, I'd rather keep the index approach instead of those flags.

My testing shows that it's suboptimal. It might be wrong but if so I'd
like to see some numbers.

> > 
> > 
> > * Processing descriptors in and out of order
> > 
> > Device processing all descriptors in order can simply flip
> > the DESC_HW bit as it is done with descriptors.
> > 
> > Device can process descriptors out of order, and write them out in order
> > as they are used, overwriting descriptors that are there.
> > 
> > Device must not use a descriptor until DESC_HW is set.
> > It is only required to look at the first descriptor
> > submitted, but it is allowed to look anywhere
> > in the ring. This might allow parallel processing.
> > 
> > Driver must not overwrite a descriptor until DESC_HW is clear.
> > It is only required to look at the first descriptor
> > submitted, but it is allowed to look anywhere
> > in the ring. This might allow parallel processing.
> > 
> > 
> > * Interrupt/event suppression
> > 
> > virtio 1.0 has two mechanisms for suppression but only
> > one can be used at a time. Let's pack them together
> > in a structure - one for interrupts, one for notifications:
> > 
> > 
> > struct event {
> > 	__le16 idx;
> > 	__le16 flags;
> > }
> > 
> > Flags can be used like in virtio 1.0, by storing a special
> > value there:
> > 
> > #define VRING_F_EVENT_ENABLE  0x0
> > 
> > #define VRING_F_EVENT_DISABLE 0x1
> > 
> > or it can be used to switch to event idx:
> > 
> > #define VRING_F_EVENT_IDX     0x2
> > 
> > in that case, interrupt triggers when descriptor with
> > a given index value is used.
> 
> Is there a reason why we would keep both ways and not just use the IDX one ?


Which is optimal depends on the workload:
- index requires and update each time we enable interrupts
- flag requires an ipdate each time we switch to enabled/disabled
  interrupts


Accordingly, for workloads that enable interrupts often,
flags are better. For workloads that keep interrupts
disabled most of the time, index is better.





> Thanks,
> 
> - Pierre


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]