[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] virtio queue numbering and optional queues
On 22.08.23 15:40, Stefan Hajnoczi wrote:
On Mon, Aug 21, 2023 at 03:18:50PM -0700, Daniel Verkamp wrote:Hello virtio folks,Hi Daniel, I have CCed those involved in the free page hint and page reporting features. StefanI noticed a mismatch between the way the specification defines device-specific virtqueue indexes and the way device and driver implementers have interpreted the specification. As a practical example, consider the traditional memory balloon device [1]. The first two queues (indexes 0 and 1) are available as part of the baseline device, but the rest of the queues are tied to feature bits. Section 5.5.2, "Virtqueues", gives a list that appears to be a mapping from queue index to queue name/function, defining queue index 3 as free_page_vq and index 4 as reporting_vq, and declaring that "free_page_vq only exists if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set" and "reporting_vq only exists if VIRTIO_BALLOON_F_PAGE_REPORTING is set." This wording is a bit vague, but I assume "is set" means "is negotiated" (not just "advertised by the device").
Staring at QEMU: the queues are added when the virtio-balloon device is *created*. Queues are added based on feature configuration, if they are part of the device feature set. That should translate to "is advertised", not "is negotiated". The queue ordering is as follows: * inflate queue, baseline device * deflate queue, baseline device * driver memory statistics, VIRTIO_BALLOON_F_STATS_VQ * free page hinting, VIRTIO_BALLOON_F_FREE_PAGE_HINT * free page reporting, VIRTIO_BALLOON_F_REPORTING QEMU always supports the first 3, so they use number 0-2. The other two can be configured for the device. So the queue indices vary based on actual feature presence.
Also presumably "exists" means something like "may only be used by the driver if the feature bit is negotiated" and "should be ignored by the device if the feature bit is not negotiated", although it would be nice to have a proper definition in the spec somewhere. Section 5.5.3, "Feature bits", gives definitions of the feature bits, with similar descriptions of the relationship between the feature bits and virtqueue availability, although the wording is slightly different ("present" rather than "exists"). No dependency between feature bits is defined, so it seems like it should be valid for a device or driver to support or accept one of the higher-numbered features while not supporting a lower-numbered one.
Yes, that's my understanding.
Notably, there is no mention of queue index assignments changing based on negotiated features in either of these sections. Hence a reader can only assume that the queue index assignments are fixed (i.e. stats_vq will always be vq index 4 if F_STATS_VQ is negotiated, regardless of any other feature bits).
And that does not seem to be the case. At least QEMU assigns them sequentially, based on actually configured features for the device. If I read the kernel code correctly (drivers/virtio/virtio_balloon.c:init_vqs) it also behaves that way: if the device has a certain feature "virtio_has_feature", it gets the next index. Otherwise, the next index goes to another feature: /* * Inflateq and deflateq are used unconditionally. The names[] * will be NULL if the related feature is not enabled, which will * cause no allocation for the corresponding virtqueue in find_vqs. */ callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack; names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate"; callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack; names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate"; callbacks[VIRTIO_BALLOON_VQ_STATS] = NULL; names[VIRTIO_BALLOON_VQ_STATS] = NULL; callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; names[VIRTIO_BALLOON_VQ_REPORTING] = NULL; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { names[VIRTIO_BALLOON_VQ_STATS] = "stats"; callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request; } if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq"; callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; } if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) { names[VIRTIO_BALLOON_VQ_REPORTING] = "reporting_vq"; callbacks[VIRTIO_BALLOON_VQ_REPORTING] = balloon_ack; } err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, callbacks, names, NULL); if (err) return err;
Now consider a scenario where VIRTIO_BALLOON_F_STATS_VQ and VIRTIO_BALLOON_F_PAGE_REPORTING are negotiated but VIRTIO_BALLOON_F_FREE_PAGE_HINT is not (perhaps the device supports all of the defined features but the driver only wants to use reporting_vq, not free_page_vq). In this case, what queue index should be used by the driver when enabling reporting_vq? My reading of the specification is that the reporting_vq is always queue index 4, independent of whether VIRTIO_BALLOON_F_STATS_VQ or VIRTIO_BALLOON_F_FREE_PAGE_HINT are negotiated, but this contradicts existing device and driver implementations, which will use queue index 3 (the next one after stats_vq = 2) as reporting_vq in this case.
Then the specification really needs updating :)
The qemu virtio-ballon device [2] assigns the next-highest unused queue index when calling virtio_add_queue(), and in the scenario presented above, free_page_vq will not be added since F_STATS_VQ is not negotiated, so reporting_vq will be assigned queue index 3, rather than 4. (Additionally, qemu always adds the stats_vq regardless of negotiated features, but that's irrelevant in this case since we are assuming the STATS_VQ feature is negotiated.) The Linux virtio driver code originally seemed to use the correct (by my reading) indexes, but it was changed to match the layout used by qemu in a 2019 commit ("virtio_pci: use queue idx instead of array idx to set up the vq") [3] - in other words, it will now also expect queue index 3 to be reporting_vq in the scenario laid out above.
Note that at the time of this commit, there was no support for "free page reporting". callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack; names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate"; callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack; names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate"; names[VIRTIO_BALLOON_VQ_STATS] = NULL; names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { names[VIRTIO_BALLOON_VQ_STATS] = "stats"; callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request; } if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq"; callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; } And as QEMU always sets VIRTIO_BALLOON_F_STATS_VQ, that one always gets id=2. Consequently, VIRTIO_BALLOON_F_FREE_PAGE_HINT, if around, gets id=3. As we didn't support VIRTIO_BALLOON_F_REPORTING, it doesn't matter which id it gets. But as you note, once we have different implementations and more feature variability, it's a mess. A device that implements VIRTIO_BALLOON_F_FREE_PAGE_HINT but not VIRTIO_BALLOON_F_STATS_VQ might not work correctly with either old or new QEMU. Maybe it needs to be documented that any device that implements either VIRTIO_BALLOON_F_FREE_PAGE_HINT or VIRTIO_BALLOON_F_REPORTING *must* also implement VIRTIO_BALLOON_F_STATS_VQ, so old+new Linux drivers would continue working.
I'm not sure how to resolve the mismatch between the specification and actual implementation behavior. The simplest change would probably be to rewrite the specification to drop the explicit queue indexes in section 5.5.2 and add some wording about how queues are numbered based on negotiated feature bits (this would need to be applied to other device
Yes.
types that have specified queue indexes as well). However, this would also technically be an incompatible change of the specification. On the other hand, changing the device and driver implementations to match the specification would be even more challenging, since it would be an incompatible change in actual practice, not just a change of the spec to match consensus implementation behavior.
Changing drivers/devices is pretty much impossible. So we should document the queue assignment better, and maybe the implication of requiring VIRTIO_BALLOON_F_STATS_VQ when any new features are implemented. Does that make sense?
Perhaps drivers could add a quirk to detect old versions of the qemu device and use the old behavior, while enabling the correct behavior only for other device vendors and newer qemu device revisions, and the qemu device could add an opt-in feature to enable the correct behavior that users would need to enable only when they know they have a sufficiently new driver with the fix. Or maybe there could be a new feature bit that would opt into following the spec-defined queue indexes (VIRTIO_F_VERSION_2?) and some new wording to require devices to use the old behavior when that bit is not negotiated, but that also feels less than ideal to me. Any thoughts on how to proceed with this situation? Is my reading of the specification just wrong?
I think you raised an important point. We should try documenting reality in the specification. -- Cheers, David / dhildenb
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]