virtio-comment message

Subject: Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND

From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Date: Wed, 22 Nov 2023 09:41:10 +0800



On 11/17/2023 7:04 PM, Michael S. Tsirkin wrote:

On Fri, Nov 17, 2023 at 06:13:50PM +0800, Zhu, Lingshan wrote:

On 11/16/2023 8:09 PM, Michael S. Tsirkin wrote:

On Thu, Nov 16, 2023 at 06:09:38PM +0800, Zhu, Lingshan wrote:

On 11/16/2023 1:35 AM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 13, 2023 2:53 PM

On 11/10/2023 2:31 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Friday, November 10, 2023 11:52 AM

On 11/9/2023 6:15 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Thursday, November 9, 2023 3:28 PM

On 11/9/2023 1:46 AM, Michael S. Tsirkin wrote:

On Tue, Nov 07, 2023 at 05:27:23PM +0800, Zhu, Lingshan wrote:

On 11/6/2023 5:49 PM, Michael S. Tsirkin wrote:

On Fri, Nov 03, 2023 at 06:34:34PM +0800, Zhu Lingshan wrote:

When SUSPEND is set, device states and virtqueue states should
be stablized, therefore the driver should not reset vqs when
SUSPEND is set in device status.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
content.tex | 3 +++
1 file changed, 3 insertions(+)

diff --git a/content.tex b/content.tex index bcc9d4b..060b5c2
100644
--- a/content.tex
+++ b/content.tex
@@ -444,6 +444,9 @@ \subsubsection{Virtqueue
Reset}\label{sec:Basic

Facilities of a Virtio Device /

The device MUST reset any state of a virtqueue to the default

state,

including the available state and the used state.
+If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in
+\field{device status}, the driver SHOULD NOT reset any virtqueues.
+
\drivernormative{\paragraph}{Virtqueue Reset}{Basic
Facilities of a

Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}

After the driver tells the device to reset a queue, the
driver MUST verify that

Seems somewhat arbitrary and breaks the claim that the feature
is orthogonal and can have uses besides migration.

when suspended, the device is frozen.
The driver is aware of this process and so should not reset the vqs I

think.

Again that is only true because you want to use it for migration.
But then you can't claim it's a generic facility.

I don't get it. The device status is a basic facility.

We need to SUSPEND the device by setting SUSPEND bit, to stabilize
the device states for migration.

Is the PCI's PM time not enough to suspend the device?
For large device I could imagine it could be short.

As you see, PCI PM, so this is a layer violation, virtio should be
self contained,

If you think it is layer violation, than suspend bit for sure is not needed. PCI

PM interface should suspend/resume the device on D0<->D3 state transitions.
Doesn't make sense logically, because it is layer violation, so you want it to be
worse? For example, virito writes 0 to device status to reset a device, not by PCI.

All these layer violation thing is just abstract to me.
Your argument contradicts with your fellow author and yourself.

I don't see how, we keep telling you virtio should be self contained, and
suspend by PCI PM is a
layer volition, this is a fact, right?

Not really. Look at the charter - when available we should use platform
capabilities because it makes it easier to write drivers.

I think that is transport specific implementation, for example pci common cfg.

I donât want to make it worse.
If you think its layer violation, just depend on the PCI PM, no need to include new suspend bit.

Again, virtio should be self-contained, not layer volited, for example, we
reset virito devices
by writing 0 to device status, not by PCI FLR.

There are some advantage to doing it like this, e.g. one does not need
to save and restore config space. What are advatages of suspend via this
bit?

suspend a device by the device status is the same as how we enable a virito
device.

Doing this by PCI is clearly a layer volition, and does not work for other
transports.

and what about MMIO and CCW?

They have largely lacked the richness of PCI transport. So those transport

needs to evolve.
I am not sure CCW and MMIO maintainers want to hear this.

Otherwise, PCI offers rich transport facilities compared to MMIO, hence, it will

continue wider use.
you know this SUSPEND bit work fine on all transport, right? Because
device_status is transport independent.

I want to emphasize that I am not against the suspend bit as long as it is guest driver controlled without interfering the device migration flow (like rest of the state).

When migrate a device, it is the host who suspends the device. The reason is
the live migration process should be transparent to
the guest, so we should suspend the guest first, then suspend the device(by
host).

The practical reason for suspending functionality under guest control is, that resuming/suspending the large device can take time.
So let it be in guest driver control. No need to muddy with device migration flow.

The time cost is reasonable in O(N) no matter how you suspend/resume the
device.

Very much depends. Big O notation can be misleading. If you have to
repeat an operation 1000 times that's 1000 * N and suddenly you are
going from milliseconds to seconds.

I mean enable 100 queues cost more time then enable 1 vq no matter
how we enable it. that is O(N)

Depends on what "that" is. Number of VM exits does not have to be O(N),
you can pass these 100 queues in memory.

For batching, yest. But I don't see this as a problem because we enable
vqs by this way for many years, so far so good.



                         This should be a basic facility.

                     Other transport can also offer like PCI.

                 Do you want to work for these transport? Implementing the new features as
                 PCI?

             Not presently as PCI as more features than rest of the two.
             What I read about ccw is: " S/390 based virtual machines support neither PCI nor MMIO".

             And I also read, "The IBM System/390 is a discontinued mainframe product family implementing".

             So I donât know who needs to extend ccw.
             And if one needs, those maintainers will extend it to match to PCI standard.

         So these features are even not planned, so don't depend on them.

     But again can one suspend ccw device? If you are adding this feature and
     claiming it's supported for all transports you better find out
     what does it do.

I am not an expert on CCW, anything block we suspend a CCW device by this bit?

I don't think CCW supports suspend at all.

I think it is not a transport feature but a device feature,
the device can always suspend it self, like don't process data
and stop responding until a specific signal.

This seems only controlled by the device itself.

And? What it the point of suspending only the device if rest of system
is still going?

That is an orchestration issue, totally up to the administrators.
Normally when suspending the device, the guest are very likely
to be suspended already.

In that case if there is suspend the device available, it will be
used by the

guest driver itself, hypervisor wouldnât know about it when those
registers are not trapped.

So we need two ways to suspend.
One is guest visible, and guest controlled.
Second is hypervisor control to fulfill the device migration needs.

The guest can eve reset the device.

So if you can please take a look if the proposed admin command to

freeze/stop mode can be used in the emulated register case or not.

It helps to have the suspend bit in guest control as well
with/without

emulation mode.
Parav, please believe I have read your series, I didn't comment there
because I want to avoid further conflicts/debating, we have done these

enough.

I believe the series posted in v3 can support vdpa use case as well.
So I will progress to post v4.

As explained before, freeze/stop the device by PCI is a layer violation.

I am afraid, we have different vision.
I donât see any layer violation.
Suspend is enough in the PCI PM.
Our vision is more aligned with rest of the hypervisor knobs that owns the

migration framework.
I think I have explained, virito builds on other transport and it should be self-
contained, so far so good.

Virtio without any transport binding is just blank paper discussion.

virtio is built on some transports, but not bind to any.

Binding is an OS specific thing, but e.g. under Linux transport drivers bind to
devices then virtio drivers bind to virtio bus. No binding -> nothing
works.

I think general facilities are better not only work on a specific transport

But platform facilities are even better we don't need to work on them at
all.

Yes, so I also agree to track dirty pages by the platform, on-CPU dirty page
tracking facilities serving all transport, not only PCI.

                         And device status can be pass-through(without emulation, just map it
                         to
                         guest) to the guest or trapped(trap and emulate by the hypervisor,
                         for example set_status in vDPA).

                     When it is pass-through, it is controlled by the guest, so for example, if the

                 guest resets the device, hypervisor has lost the control of migration context etc.

                     Hence, hypervisor needs a channel which is not guest owned.

                     Same channel can work when trap+emulation is done.

                 It is the guest owns the device, it can reset the device, once reset, the device
                 context are cleared.

             Hypervisor do not have the ability to read/write the device context. It lost the channel as hypervisor is not involved in trap+emulation.
             So it is not helpful in one use case.

             Admin commands can work even with trap+emulation mode.

             What is missing, that should be added?

         as explained above, when live migration, the guest should be suspended
         first, at this point,
         the host owns the device, it has access to the device.

     Where do you say this in the spec patch?

VM live migration is not in this spec.

Then it should be.

If we suspend the device first, then the guest may detect IO errors.

That's bad. So you need to tell driver what not to do so as not to get
errors.

I think the process should be suspending the guest first, then the host
owns the device, so it can suspend the guest and collect the necessary data
for live migration.


                                 This can also be used for debugging I think.

                             As Michael listed, a dedicated debug interface is usually more
                             useful instead

                         of in-band.
                         re-using another facility without extra efforts is not a bad thing anyway.

                     I just donât see how a suspend bit some debug feature.
                     Almost everything with that regard is a debug feature to me.

                 suspend then check the device states?

             You already suspended the device, so device state is already changed.
             All debug information is changed, so not useful now.

         When suspended, the device should keep and stabilize its device states,
         at least in my series it should behave like this.

     That's vague. What does it mean exactly and what happens if
     some external event causes state change?

it is suspended, somehow like powered-down, so it should not
respond to the events until resume.

"somehow" is too vague for the spec.

Yeah, in spec, we have a section to describe what the device should dowhen SUSPEND.

Follow-Ups:
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Michael S. Tsirkin" <mst@redhat.com>

References:
- Re: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Michael S. Tsirkin" <mst@redhat.com>