OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] packed ring layout proposal


Hello Michael, 

Thanks for doing this. ;-)


> Le 16 sept. 2016 à 00:39, Michael S. Tsirkin <mst@redhat.com> a écrit :
> 
> 
> Let's start a discussion around an alternative ring layout.
> This has been in my kvm forum 2016 presentation.
> The idea is to have a r/w descriptor in a ring structure,
> replacing the used and available ring, index and descriptor
> buffer.
> 
> * Descriptor ring:
> 
> Guest adds descriptors with unique index values and DESC_HW set in flags.
> Host overwrites used descriptors with correct len, index, and DESC_HW
> clear.  Flags are always set/cleared last.
> 
> #define DESC_HW 0x0080
> 
> struct desc {
>        __le64 addr;
>        __le32 len;
>        __le16 index;
>        __le16 flags;
> };
> 
> When DESC_HW is set, descriptor belongs to device. When it is clear,
> it belongs to the driver.

I think the simplified ring layout is a good idea. 
It reduces the number of indirections, and will probably help pre-fetching data in order to avoid cache misses.

I am not completely convinced by the flag-based approach instead of the index-based approach though.
With the index-based approach, you have only one-read to do in order
to know up to what point in the queue you can go.

With this flag, every-time you look at a new descriptor, before doing anything you will have to look at the flags.
Meaning that you will have to wait for the result of a test on a very newly fetched piece of memory.

So maybe it would be interesting to still have a used_index and available_index (put in different cache lines).

> 
> 
> * Scatter/gather support
> 
> We can use 3 bits to set direction
> and to chain s/g entries in a request, same as virtio 1.0:
> 
> /* This marks a buffer as continuing via the next field. */
> #define VRING_DESC_F_NEXT       1
> /* This marks a buffer as write-only (otherwise read-only). */
> #define VRING_DESC_F_WRITE      2
> 
> Unlike virtio 1.0, all descriptors must have distinct ID
> values.
> 
> * Indirect buffers
> 
> Can be done like in virtio 1.0:
> 
> /* This means the buffer contains a list of buffer descriptors. */
> #define VRING_DESC_F_INDIRECT   4
> 
> In the descriptors in the indirect buffers, I think we should drop index
> field altogether, just put s/g entries one after the other.
> 
> Also, length in the indirect descriptor should mark
> the list of the chain.
> 
> virtio 1.0 seems to allow a s/g entry followed by
> an indirect descriptor. This does not seem useful,
> so let's not allow that anymore.

First of all, if I understand correctly, VRING_DESC_F_NEXT doesn't use a 'next' field, and rather just
say that the next descriptor in the queue is part of the same message. I think this is really, really, good.

Paolo proposed in a later mail to drop VRING_DESC_F_NEXT altogether, and I think
he is right at least in the sense that having two solutions for doing almost the same thing 
is not necessary (IMHO, even harmful).

But, VRING_DESC_F_INDIRECT adds a layer of indirection and makes prefetching a bit
more complicated.
The reason why VRING_DESC_F_INDIRECT was useful in the first place is that virtio 1.0 
didn't have enough descriptors in the main queue (it is equal to the queue size).

Therefore, ideally, I'd rather have a longer queue and use VRING_DESC_F_NEXT only and drop
VRING_DESC_F_INDIRECT altogether.

And if delay spent in the queue is an issue. I think it's driver's job to avoid bufferbloat if necessary 
by adapting the number of descriptors is allows the device to use.

> 
> 
> * Batching descriptors:
> 
> virtio 1.0 allows passing a batch of descriptors in both directions, by
> incrementing the used/avail index by values > 1.  We can support this by
> tagging first/middle/last descriptors in the flags field. For
> comptibility, polarity is reversed so bit is set for non-first and
> non-last descriptors in a batch:
> 
> #define BATCH_NOT_FIRST 0x0010
> #define BATCH_NOT_LAST  0x0020
> 
> In other words only descriptor in batch is 0000.
> 
> Batch does not have to be processed together, so
> !first/!last flags can be changed when descriptor is used.
> 

I don't exactly understand why this is necessary.
Is there any added value by knowing that a bunch of messages have been written at 
the same time.

As mentioned above, I'd rather keep the index approach instead of those flags.

> 
> 
> * Processing descriptors in and out of order
> 
> Device processing all descriptors in order can simply flip
> the DESC_HW bit as it is done with descriptors.
> 
> Device can process descriptors out of order, and write them out in order
> as they are used, overwriting descriptors that are there.
> 
> Device must not use a descriptor until DESC_HW is set.
> It is only required to look at the first descriptor
> submitted, but it is allowed to look anywhere
> in the ring. This might allow parallel processing.
> 
> Driver must not overwrite a descriptor until DESC_HW is clear.
> It is only required to look at the first descriptor
> submitted, but it is allowed to look anywhere
> in the ring. This might allow parallel processing.
> 
> 
> * Interrupt/event suppression
> 
> virtio 1.0 has two mechanisms for suppression but only
> one can be used at a time. Let's pack them together
> in a structure - one for interrupts, one for notifications:
> 
> 
> struct event {
> 	__le16 idx;
> 	__le16 flags;
> }
> 
> Flags can be used like in virtio 1.0, by storing a special
> value there:
> 
> #define VRING_F_EVENT_ENABLE  0x0
> 
> #define VRING_F_EVENT_DISABLE 0x1
> 
> or it can be used to switch to event idx:
> 
> #define VRING_F_EVENT_IDX     0x2
> 
> in that case, interrupt triggers when descriptor with
> a given index value is used.

Is there a reason why we would keep both ways and not just use the IDX one ?

Thanks,

- Pierre


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]