OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [RFC PATCH 4/5] virtqueue: constraints for virtqueue state




On 9/8/2023 2:23 PM, Si-Wei Liu wrote:


On 9/7/2023 2:34 AM, Zhu, Lingshan wrote:


On 9/7/2023 4:09 PM, Eugenio Perez Martin wrote:
On Tue, Sep 5, 2023 at 11:08âAM Zhu, Lingshan <lingshan.zhu@intel.com> wrote:


On 8/21/2023 5:26 PM, Eugenio Perez Martin wrote:
On Fri, Aug 18, 2023 at 11:44âAM Zhu, Lingshan <lingshan.zhu@intel.com> wrote:

On 8/17/2023 11:19 PM, Eugenio Perez Martin wrote:
On Tue, Aug 15, 2023 at 1:30âPM Zhu, Lingshan <lingshan.zhu@intel.com> wrote:
On 8/15/2023 8:34 AM, Jason Wang wrote:
On Mon, Aug 14, 2023 at 7:29âPM Zhu Lingshan <lingshan.zhu@intel.com> wrote:
This commit specifies the constraints of the virtqueue state,
and the actions should be taken by the device when SUSPEND
and DRIVER_OK is set

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
ÂÂÂÂ content.tex | 31 +++++++++++++++++++++++++++++++
ÂÂÂÂ 1 file changed, 31 insertions(+)

diff --git a/content.tex b/content.tex
index 43bd5de..f6ac581 100644
--- a/content.tex
+++ b/content.tex
@@ -587,6 +587,37 @@ \subsection{\field{Used State} Field}

ÂÂÂÂ See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.

+\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
+
+If VIRTIO_F_QUEUE_STATE has been negotiated, the driver MUST set SUSPEND in \field{device status} +first before getting or setting Virtqueue State of any virtqueues.
I don't get why this is a must. It could be useful for debugging.
To avoid race conditions with the device and make the device
implementation easier
+
+If VIRTIO_F_QUEUE_STATE has been negotiaged but VIRTIO_RING_F_PACKED not been negotiated,
typo
yes
+the driver MUST NOT access \field{Used State} of any virtqueues, it should use the
+used index in the used ring.
+
+\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
+
+If VIRTIO_F_QUEUE_STATE has been negotiated but SUSPEND is not set in \field{device status}, +the device MUST ignore any accesses against Virtqueue State of any virtqueues.
Btw, do we need to clarify the behavior of ring reset after suspending?
I think once suspended, the device should ignore resetting a queue
Actually shadow virtqueue could benefit from the ability to change vq
properties (addresses) while the device is suspended, and then just
resume it. I've been told that ring reset is overkill for that.
If ring reset is overkill, is SUSPEND even more overkill?
It depends on the cost of recreating the vq in the device I think. But
it has more to do with *what* is changed in the vq, as it seems some
parameters (vq size) has more impact than others like vq address. The
way to stop the device does not affect, but ring reset offers the
possibility of change all of the parameters already.

Adding Si-Wei and Dragos here, as they pointed it out in the
virtio-networking upstream meeting.

But probably it is better to address it on top, with another feature flag.
I think if we want to changing the vq properties, there must be a
mechanism to
stop the queue then resume the queue.

How about allow setting queue_enable = 0 to stop it and =1 to resume and
force it reinitialize?

Yes, I think that is better suited. But maybe this is better to be
added on top, so we maintain this series small.
Hi Eugenio,

I have a second thought while implementing above queue_enable = 0,
it doesn't provide more advantages over queue_reset:

1) queue_reset can help to stop a queue and the vq properties can be
reconfigured during queue_reset --> queue_enable.

2) once the driver sees SUSPEND presented by the device, it assume the
device states and vq states are stable, at that point the driver can
read reliable device configurations. So vq reset should be ignored
once SUSPEND is present and if we implement queue stop, it should be
ignored too when SUSPEND.

The relation between SUSPEND and ring_reset needs to be described in
this series, yes. This is a good start, but I'm not sure if this one
meets all the requirements for SW assisted live migration.

We can always add new feature flags to define a different interaction
in the future, like for devices that can support the change of vq
attributes in the suspend. To not steal the merit, this idea was
proposed by Si-Wei in a recent virtio-networking meeting.
If so, we even don't need a new feature bit. We can just allow
resetting vqs after the device presenting SUSPEND.
For the single bit of feature interaction with queue_reset this looks fine, but queue_reset is perhaps not the only feature that needs to interact with SUSPEND. While on the other hand I suspect it's probably not easy to converge on everything all at once for the moment. Just to avoid the lure of hijacking this thread for other things, it'd be easier I feel to define a pristine SUSPEND method starting with the most restrictive mandates, describing every possible means to prohibiting *any* change to the config space for device in suspension. This not just keeps the (backward) compatibility on the table which is consistent with the assumption of various SUSPEND implementations available today, but would make it possible to customize different flavors of interactions guarded by different feature flag in the future. For instance, today queue_reset may mostly work the best on software device implementation where one can introduce a specific SUSPEND_RING_RESET_ALLOWED feature flag to unlock/override part of the restriction from the pristine SUSPEND feature when both are negotiated and used together. In future, if there's any need to revisit this part for e.g. hardware device implementation of queue_reset might not be able to meet certain desired performance (downtime) goal, then a new feature might have to be introduced to define another hardware-biased means of interaction with suspended device.
Hi Siwei

OK, I got it, there can be a new feature bit for resetting a queue after SUSPEND, and other interactions can follow the same way, more flexible.


The device presenting SUSPEND indicates that the device config space
is stabilized at that moment, ready for the driver to fetch fields data there.

Then the driver is allowed to reset, re-config and re-enable the vqs.
Maybe not for this case, but for completeness I found a very relevant question is, as your patch defines SUSPEND in the context of live migration, how do you envision to resume/restart the device immediately in place on the source host (say migration is cancelled after all devices are suspended, or migration failed at the last minute for some reason)? Reset the device and start to recover everything from scratch? Or do queue_reset then queue_enable on every virtqueue while keeping the other device states (those already populated through ctrl vq) around? Or suppose right now we have a symmetric RESUME feature that keeps every device state including the queue state in place. Which option a hardware vendor would like to pick if user/customer would like to have the best/least downtime? Does the hardware's choice matter much for software device implementation?

As can be seen amongst these options, there's perhaps no single best solution between software and hardware devices, or even between different hardware vendors. So instead of ruling out possibility for future extension to flavor other implementations, be it hardware or software, I feel it's probably not the best thing for now to get SUSPEND hard wired to queue_reset or RESUME. Device reset is the base case that every device has to implement, that I feel might be the only failsafe method to get the device out of the suspension state with pristine SUSPEND.
In case of failed or cancelled Live Migration, the driver can reset the re-config the device to resume it for sure.

In this series, we also say:
If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND and resumes operation upon DRIVER_OK.



The only requirement is: The driver is responsible for maintain
the integrity and validity of the config space fields, because
the device is ready-only to the config space at that moment(SUSPEND-ed)
and the driver should be responsible for its actions, perform proper
synchronizations, e.g., re-read.
It looks fine, though as stated above, please leave it to a different feature flag with another patch to define the queue_reset interaction with SUSPEND.
Sure, we will introduce a new feature bit for resetting vq.

Thanks for your advice
Zhu Lingshan

Thanks,
-Siwei


Does this work for you?

Thanks

3) the device should only accept resetting a queue when !SUSPEND and
the driver can flush the queue buffers before resetting it to avoid
losing buffers,
and we will have tracker for in-flight descriptors later.

Any thoughts?

Thanks
Thanks!

Thanks
Zhu Lingshan
+
+When VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED is not,
+the device MUST ignore any accesses against \field{Used State}.
+
+If VIRTIO_F_QUEUE_STATE has been negotiaged, the device MUST reset
+the Virtqueue State of every virtqueue upon a reset.
Need to define the meaning of "reset" this is important for packed virtqueue.
I will remove this as Stefan suggested.
+
+If VIRTIO_F_QUEUE_STATE and VIRTIO_RING_F_PACKED have been negotiaged, when SUSPEND is set, +the device MUST record the Virtqueue State of every enabled virtqueue
+in \field{Available State} and \field{Used State} respectively,
+and correspondingly restore the Virtqueue State of every enabled virtqueue +from \field{Avaiable State} and \field{Used State} when DRIVER_OK is set.
We can just let the device report those states in any case then we don't need to care about those details, or did you see any blockers?
Agree, I will add the definition of used_state of splitted vq in the
next version

Thanks
Thanks

+
+If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED has been not, when SUSPEND is set, +the device MUST record the available state of every enabled virtqueue in \field{Available State}, +and restore the available state of every enabled virtqueue from \field{Avaiable State}
+when DRIVER_OK is set.
+
ÂÂÂÂ \input{admin.tex}

ÂÂÂÂ \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
--
2.35.3

This publicly archived list offers a means to provide input to the

OASIS Virtual I/O Device (VIRTIO) TC.



In order to verify user consent to the Feedback License terms and

to minimize spam in the list archive, subscription is required

before posting.



Subscribe: virtio-comment-subscribe@lists.oasis-open.org

Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org

List help: virtio-comment-help@lists.oasis-open.org

List archive: https://lists.oasis-open.org/archives/virtio-comment/

Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf

List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists

Committee: https://www.oasis-open.org/committees/virtio/

Join OASIS: https://www.oasis-open.org/join/






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]