OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] virtio packed ring concerns


On Wed, 2020-01-29 at 04:53 -0500, Michael S. Tsirkin wrote:
> I second Stefan's comments. To add to that, I think the spec
> could be improved to avoid this kind of misunderstanding.
> See comments below:
> 
> On Tue, Jan 28, 2020 at 05:26:07PM +0000, Walker, Benjamin wrote:
> > Hi all - I'm a core maintainer for the SPDK project, which includes a vhost-
> > user 
> > target implementation that handles both virtio-blk and virtio-scsi. Recently
> > we
> > had some engineers go off and attempt to add support for packed virtqueues,
> > as
> > defined here: 
> > https://github.com/oasis-tcs/virtio-spec/blob/master/packed-ring.tex
> > 
> > While reviewing the patches they submitted I noticed two separate issues.
> > The
> > first issue is that the ring cannot function correctly under certain
> > circumstances by spec. The second issue is a performance issue. I'm going to
> > try
> > to keep both of these entirely separate.
> > 
> > Comments are very welcome here, and I'm hoping I'm just missing some
> > critical
> > piece of information that resolves the whole thing, but testing is so far
> > showing that I'm finding a real problem.
> > 
> > First, the functional issue. The problem scenario involves out of order
> > completions during ring wrap around. Here are the most relevant sections of
> > the
> > specification:
> > 
> > ~~~
> > Note: after reading driver descriptors and starting their processing in
> > order,
> > the device might complete their processing out of order.  Used device
> > descriptors are written in the order in which their processing is complete.
> > 
> > The counter maintained by the driver is called the Driver Ring Wrap Counter.
> > The
> > driver changes the value of this counter each time it makes available the
> > last
> > descriptor in the ring (after making the last descriptor available).
> > 
> > The counter maintained by the device is called the Device Ring Wrap
> > Counter.  The device changes the value of this counter
> > each time it uses the last descriptor in
> > the ring (after marking the last descriptor used).
> > 
> > To mark a descriptor as available, the driver sets the VIRTQ_DESC_F_AVAIL
> > bit in
> > Flags to match the internal Driver Ring Wrap Counter.  It also sets the
> > VIRTQ_DESC_F_USED bit to match the inverse value (i.e. to not match the
> > internal
> > Driver Ring Wrap Counter).
> > 
> > To mark a descriptor as used, the device sets the VIRTQ_DESC_F_USED bit in
> > Flags
> > to match the internal Device Ring Wrap Counter.  It also sets the
> > VIRTQ_DESC_F_AVAIL bit to match the \emph{same} value.
> > ~~~
> > 
> > Imagine a packed ring with 4 descriptors. At some earlier time let's say the
> > first 3 descriptors were submitted and completed, in order. Then, two more
> > descriptors were submitted (so they went into slot 3 and then 0).
> 
> And let's assume they have IDs 3 and 0, respectively.
> 
> > Each time
> > through the ring, the "ring wrap counter" toggles from 1 to 0 or vice versa.
> > The
> > avail and used flags are initialized to 1 at start up and the ring wrap
> > counter
> > is 0.
> > 
> > So when submitting into slot 3, the driver flips the avail bit from 1 to 0.
> > When
> > submitting into slot 0 after the ring-wrap, it flips the avail bit from 0 to
> > 1.
> > The spec says it also sets the used flag, but the used flag is already the
> > correct value. That's all solid so far.
> > 
> > The device is then polling on slot 3 waiting for new descriptors to be
> > submitted. Note here that the device must be maintaining its own copy of
> > what it
> > thinks is the driver's ring wrap counter value to detect descriptors being
> > made
> > available, which isn't called out in the specification at all but all actual
> > implementations do. Regardless, once it sees the avail flag in the
> > descriptor in
> > slot 3 flip, it begins processing the descriptor. It then moves on to poll
> > slot
> > 0 for incoming descriptors (and updates its expected avail ring wrap counter
> > value internally). In this case, it sees the bit flip for slot 0 and begins
> > processing the descriptor. Descriptor processing is asynchronous and may
> > complete out of order, at least for storage devices.
> > 
> > So let's say that the descriptors do, in fact, complete out of order, such
> > that
> > the descriptor in slot 0 completes first and then the descriptor in slot 3
> > completes second. The specification is very clear that this is allowed.
> > 
> > When completing the descriptor in slot 0, the device has not yet completed
> > the
> > last descriptor in the ring (they're out of order!). So the ring wrap
> > counter is
> > still 0. So it then proceeds to write both the avail flag and the used flag
> > to
> > 0. But the driver is expecting the used flag to get flipped to 1 to indicate
> > a
> > completion. This is broken.
> 
> Right but the spec says
> 	 Used device
> 	 descriptors are written in the order in which their processing is
> complete.
> 
> so what should happen, is this:
> 	 descriptor with ID 0 is written into slot 3.
> 	 descriptor with ID 3 is written into slot 0.
> 
> 
> The following might be helpful to describe this better:
> 
> the spec already says:
> 
> 	The Descriptor Ring is used in a circular manner: the driver writes
> descriptors into the ring in order. After
> 	reaching the end of the ring, the next descriptor is placed at the head
> of the ring. Once the ring is full of driver
> 	descriptors, the driver stops sending new requests and waits for the
> device to start processing descriptors
> 	and to write out some used descriptors before making new driver
> descriptors available.
> 
> and maybe we should add:
> 
> 	Similarly, device writes out used descriptors in a circular manner:
> 	starting at the head of the ring, and until it reaches the end of the
> 	ring. After reaching the end of the ring, the next used descriptor is
> 	again placed at the head of the ring.

Yes please! No where in the current wording does it say the device writes the
descriptors circularly. It says the driver writes them circularly and that the
device reads them circularly only.

> 
> 
> Also spec currently says:
> 
> 	The counter maintained by the device is called the Device Ring Wrap
> Counter. The device changes the
> 	value of this counter each time it uses the last descriptor in the ring
> (after marking the last descriptor used).
> 
> in fact this part may be confusing: it is not clear what does "uses the last
> descriptor"
> mean - you took it to mean "writes out descriptor with ID from the last
> descriptor", what spec means is "writes out a used descriptor over the
> last descriptor location".  The text also does not properly account for
> skipping descriptors.
> 
> What it should say is:
> 
> 	The counter maintained by the device is called the Device Ring Wrap
> Counter. The device changes the
> 	value of this counter each time it reaches the last descriptor
> 	in the ring when writing used descriptors
>         (after writing out or skipping over the last used descriptor).
> 
> Please let me know whether the changes suggested above would clarify
> things for you, or whether you have other suggestions. Thanks!
> 

I wholeheartedly agree that the new wording is much better and would have made
this much more obvious. Thanks for the assistance.

Thanks,
Ben


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]