OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] virtio packed ring concerns


I second Stefan's comments. To add to that, I think the spec
could be improved to avoid this kind of misunderstanding.
See comments below:

On Tue, Jan 28, 2020 at 05:26:07PM +0000, Walker, Benjamin wrote:
> Hi all - I'm a core maintainer for the SPDK project, which includes a vhost-user 
> target implementation that handles both virtio-blk and virtio-scsi. Recently we
> had some engineers go off and attempt to add support for packed virtqueues, as
> defined here: 
> https://github.com/oasis-tcs/virtio-spec/blob/master/packed-ring.tex
> 
> While reviewing the patches they submitted I noticed two separate issues. The
> first issue is that the ring cannot function correctly under certain
> circumstances by spec. The second issue is a performance issue. I'm going to try
> to keep both of these entirely separate.
> 
> Comments are very welcome here, and I'm hoping I'm just missing some critical
> piece of information that resolves the whole thing, but testing is so far
> showing that I'm finding a real problem.
> 
> First, the functional issue. The problem scenario involves out of order
> completions during ring wrap around. Here are the most relevant sections of the
> specification:
> 
> ~~~
> Note: after reading driver descriptors and starting their processing in order,
> the device might complete their processing out of order.  Used device
> descriptors are written in the order in which their processing is complete.
> 
> The counter maintained by the driver is called the Driver Ring Wrap Counter. The
> driver changes the value of this counter each time it makes available the last
> descriptor in the ring (after making the last descriptor available).
> 
> The counter maintained by the device is called the Device Ring Wrap
> Counter.  The device changes the value of this counter
> each time it uses the last descriptor in
> the ring (after marking the last descriptor used).
> 
> To mark a descriptor as available, the driver sets the VIRTQ_DESC_F_AVAIL bit in
> Flags to match the internal Driver Ring Wrap Counter.  It also sets the
> VIRTQ_DESC_F_USED bit to match the inverse value (i.e. to not match the internal
> Driver Ring Wrap Counter).
> 
> To mark a descriptor as used, the device sets the VIRTQ_DESC_F_USED bit in Flags
> to match the internal Device Ring Wrap Counter.  It also sets the
> VIRTQ_DESC_F_AVAIL bit to match the \emph{same} value.
> ~~~
> 
> Imagine a packed ring with 4 descriptors. At some earlier time let's say the
> first 3 descriptors were submitted and completed, in order. Then, two more
> descriptors were submitted (so they went into slot 3 and then 0).

And let's assume they have IDs 3 and 0, respectively.

> Each time
> through the ring, the "ring wrap counter" toggles from 1 to 0 or vice versa. The
> avail and used flags are initialized to 1 at start up and the ring wrap counter
> is 0.
> 
> So when submitting into slot 3, the driver flips the avail bit from 1 to 0. When
> submitting into slot 0 after the ring-wrap, it flips the avail bit from 0 to 1.
> The spec says it also sets the used flag, but the used flag is already the
> correct value. That's all solid so far.
> 
> The device is then polling on slot 3 waiting for new descriptors to be
> submitted. Note here that the device must be maintaining its own copy of what it
> thinks is the driver's ring wrap counter value to detect descriptors being made
> available, which isn't called out in the specification at all but all actual
> implementations do. Regardless, once it sees the avail flag in the descriptor in
> slot 3 flip, it begins processing the descriptor. It then moves on to poll slot
> 0 for incoming descriptors (and updates its expected avail ring wrap counter
> value internally). In this case, it sees the bit flip for slot 0 and begins
> processing the descriptor. Descriptor processing is asynchronous and may
> complete out of order, at least for storage devices.
> 
> So let's say that the descriptors do, in fact, complete out of order, such that
> the descriptor in slot 0 completes first and then the descriptor in slot 3
> completes second. The specification is very clear that this is allowed.
> 
> When completing the descriptor in slot 0, the device has not yet completed the
> last descriptor in the ring (they're out of order!). So the ring wrap counter is
> still 0. So it then proceeds to write both the avail flag and the used flag to
> 0. But the driver is expecting the used flag to get flipped to 1 to indicate a
> completion. This is broken.

Right but the spec says
	 Used device
	 descriptors are written in the order in which their processing is complete.

so what should happen, is this:
	 descriptor with ID 0 is written into slot 3.
	 descriptor with ID 3 is written into slot 0.


The following might be helpful to describe this better:

the spec already says:

	The Descriptor Ring is used in a circular manner: the driver writes descriptors into the ring in order. After
	reaching the end of the ring, the next descriptor is placed at the head of the ring. Once the ring is full of driver
	descriptors, the driver stops sending new requests and waits for the device to start processing descriptors
	and to write out some used descriptors before making new driver descriptors available.

and maybe we should add:

	Similarly, device writes out used descriptors in a circular manner:
	starting at the head of the ring, and until it reaches the end of the
	ring. After reaching the end of the ring, the next used descriptor is
	again placed at the head of the ring.


Also spec currently says:

	The counter maintained by the device is called the Device Ring Wrap Counter. The device changes the
	value of this counter each time it uses the last descriptor in the ring (after marking the last descriptor used).

in fact this part may be confusing: it is not clear what does "uses the last descriptor"
mean - you took it to mean "writes out descriptor with ID from the last
descriptor", what spec means is "writes out a used descriptor over the
last descriptor location".  The text also does not properly account for skipping descriptors.

What it should say is:

	The counter maintained by the device is called the Device Ring Wrap Counter. The device changes the
	value of this counter each time it reaches the last descriptor
	in the ring when writing used descriptors
        (after writing out or skipping over the last used descriptor).

Please let me know whether the changes suggested above would clarify
things for you, or whether you have other suggestions. Thanks!

-- 
MST



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]