virtio-comment message

Subject: Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND

From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Zhu, Lingshan" <lingshan.zhu@intel.com>
Date: Fri, 17 Nov 2023 06:04:27 -0500

On Fri, Nov 17, 2023 at 06:13:50PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 11/16/2023 8:09 PM, Michael S. Tsirkin wrote:
> 
>     On Thu, Nov 16, 2023 at 06:09:38PM +0800, Zhu, Lingshan wrote:
> 
> 
>         On 11/16/2023 1:35 AM, Parav Pandit wrote:
> 
>                 From: Zhu, Lingshan <lingshan.zhu@intel.com>
>                 Sent: Monday, November 13, 2023 2:53 PM
> 
>                 On 11/10/2023 2:31 PM, Parav Pandit wrote:
> 
>                         From: Zhu, Lingshan <lingshan.zhu@intel.com>
>                         Sent: Friday, November 10, 2023 11:52 AM
> 
>                         On 11/9/2023 6:15 PM, Parav Pandit wrote:
> 
>                                 From: Zhu, Lingshan <lingshan.zhu@intel.com>
>                                 Sent: Thursday, November 9, 2023 3:28 PM
> 
>                                 On 11/9/2023 1:46 AM, Michael S. Tsirkin wrote:
> 
>                                     On Tue, Nov 07, 2023 at 05:27:23PM +0800, Zhu, Lingshan wrote:
> 
>                                         On 11/6/2023 5:49 PM, Michael S. Tsirkin wrote:
> 
>                                         On Fri, Nov 03, 2023 at 06:34:34PM +0800, Zhu Lingshan wrote:
> 
>                                         When SUSPEND is set, device states and virtqueue states should
>                                         be stablized, therefore the driver should not reset vqs when
>                                         SUSPEND is set in device status.
> 
>                                         Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>                                         ---
>                                               content.tex | 3 +++
>                                               1 file changed, 3 insertions(+)
> 
>                                         diff --git a/content.tex b/content.tex index bcc9d4b..060b5c2
>                                         100644
>                                         --- a/content.tex
>                                         +++ b/content.tex
>                                         @@ -444,6 +444,9 @@ \subsubsection{Virtqueue
>                                         Reset}\label{sec:Basic
> 
>                                 Facilities of a Virtio Device /
> 
>                                               The device MUST reset any state of a virtqueue to the default
> 
>                 state,
> 
>                                               including the available state and the used state.
>                                         +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in
>                                         +\field{device status}, the driver SHOULD NOT reset any virtqueues.
>                                         +
>                                               \drivernormative{\paragraph}{Virtqueue Reset}{Basic
>                                         Facilities of a
> 
>                                 Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
> 
>                                               After the driver tells the device to reset a queue, the
>                                         driver MUST verify that
> 
>                                         Seems somewhat arbitrary and breaks the claim that the feature
>                                         is orthogonal and can have uses besides migration.
> 
>                                         when suspended, the device is frozen.
>                                         The driver is aware of this process and so should not reset the vqs I
> 
>                 think.
> 
>                                     Again that is only true because you want to use it for migration.
>                                     But then you can't claim it's a generic facility.
> 
>                                 I don't get it. The device status is a basic facility.
> 
>                                 We need to SUSPEND the device by setting SUSPEND bit, to stabilize
>                                 the device states for migration.
> 
>                             Is the PCI's PM time not enough to suspend the device?
>                             For large device I could imagine it could be short.
> 
>                         As you see, PCI PM, so this is a layer violation, virtio should be
>                         self contained,
> 
>                     If you think it is layer violation, than suspend bit for sure is not needed. PCI
> 
>                 PM interface should suspend/resume the device on D0<->D3 state transitions.
>                 Doesn't make sense logically, because it is layer violation, so you want it to be
>                 worse? For example, virito writes 0 to device status to reset a device, not by PCI.
> 
>             All these layer violation thing is just abstract to me.
>             Your argument contradicts with your fellow author and yourself.
> 
>         I don't see how, we keep telling you virtio should be self contained, and
>         suspend by PCI PM is a
>         layer volition, this is a fact, right?
> 
>     Not really. Look at the charter - when available we should use platform
>     capabilities because it makes it easier to write drivers.
> 
> I think that is transport specific implementation, for example pci common cfg.
> 
> 
> 
> 
>             I donât want to make it worse.
>             If you think its layer violation, just depend on the PCI PM, no need to include new suspend bit.
> 
>         Again, virtio should be self-contained, not layer volited, for example, we
>         reset virito devices
>         by writing 0 to device status, not by PCI FLR.
> 
>     There are some advantage to doing it like this, e.g. one does not need
>     to save and restore config space. What are advatages of suspend via this
>     bit?
> 
> suspend a device by the device status is the same as how we enable a virito
> device.
> 
> Doing this by PCI is clearly a layer volition, and does not work for other
> transports.
> 
> 
> 
>                         and what about MMIO and CCW?
> 
>                     They have largely lacked the richness of PCI transport. So those transport
> 
>                 needs to evolve.
>                 I am not sure CCW and MMIO maintainers want to hear this.
> 
>                     Otherwise, PCI offers rich transport facilities compared to MMIO, hence, it will
> 
>                 continue wider use.
>                 you know this SUSPEND bit work fine on all transport, right? Because
>                 device_status is transport independent.
> 
>             I want to emphasize that I am not against the suspend bit as long as it is guest driver controlled without interfering the device migration flow (like rest of the state).
> 
>         When migrate a device, it is the host who suspends the device. The reason is
>         the live migration process should be transparent to
>         the guest, so we should suspend the guest first, then suspend the device(by
>         host).
> 
>             The practical reason for suspending functionality under guest control is, that resuming/suspending the large device can take time.
>             So let it be in guest driver control. No need to muddy with device migration flow.
> 
>         The time cost is reasonable in O(N) no matter how you suspend/resume the
>         device.
> 
>     Very much depends. Big O notation can be misleading. If you have to
>     repeat an operation 1000 times that's 1000 * N and suddenly you are
>     going from milliseconds to seconds.
> 
> I mean enable 100 queues cost more time then enable 1 vq no matter
> how we enable it. that is O(N)

Depends on what "that" is. Number of VM exits does not have to be O(N),
you can pass these 100 queues in memory.


> 
> 
> 
>                         This should be a basic facility.
> 
>                     Other transport can also offer like PCI.
> 
>                 Do you want to work for these transport? Implementing the new features as
>                 PCI?
> 
>             Not presently as PCI as more features than rest of the two.
>             What I read about ccw is: " S/390 based virtual machines support neither PCI nor MMIO".
> 
>             And I also read, "The IBM System/390 is a discontinued mainframe product family implementing".
> 
>             So I donât know who needs to extend ccw.
>             And if one needs, those maintainers will extend it to match to PCI standard.
> 
>         So these features are even not planned, so don't depend on them.
> 
>     But again can one suspend ccw device? If you are adding this feature and
>     claiming it's supported for all transports you better find out
>     what does it do.
> 
> I am not an expert on CCW, anything block we suspend a CCW device by this bit?

I don't think CCW supports suspend at all.

> This seems only controlled by the device itself.
> 

And? What it the point of suspending only the device if rest of system
is still going?

> 
> 
>                             In that case if there is suspend the device available, it will be
>                             used by the
> 
>                         guest driver itself, hypervisor wouldnât know about it when those
>                         registers are not trapped.
> 
>                             So we need two ways to suspend.
>                             One is guest visible, and guest controlled.
>                             Second is hypervisor control to fulfill the device migration needs.
> 
>                         The guest can eve reset the device.
> 
>                             So if you can please take a look if the proposed admin command to
> 
>                         freeze/stop mode can be used in the emulated register case or not.
> 
>                             It helps to have the suspend bit in guest control as well
>                             with/without
> 
>                         emulation mode.
>                         Parav, please believe I have read your series, I didn't comment there
>                         because I want to avoid further conflicts/debating, we have done these
> 
>                 enough.
> 
>                     I believe the series posted in v3 can support vdpa use case as well.
>                     So I will progress to post v4.
> 
> 
>                         As explained before, freeze/stop the device by PCI is a layer violation.
> 
>                     I am afraid, we have different vision.
>                     I donât see any layer violation.
>                     Suspend is enough in the PCI PM.
>                     Our vision is more aligned with rest of the hypervisor knobs that owns the
> 
>                 migration framework.
>                 I think I have explained, virito builds on other transport and it should be self-
>                 contained, so far so good.
> 
>             Virtio without any transport binding is just blank paper discussion.
> 
>         virtio is built on some transports, but not bind to any.
> 
>     Binding is an OS specific thing, but e.g. under Linux transport drivers bind to
>     devices then virtio drivers bind to virtio bus. No binding -> nothing
>     works.
> 
> I think general facilities are better not only work on a specific transport
> 

But platform facilities are even better we don't need to work on them at
all.


> 
>                         And device status can be pass-through(without emulation, just map it
>                         to
>                         guest) to the guest or trapped(trap and emulate by the hypervisor,
>                         for example set_status in vDPA).
> 
>                     When it is pass-through, it is controlled by the guest, so for example, if the
> 
>                 guest resets the device, hypervisor has lost the control of migration context etc.
> 
>                     Hence, hypervisor needs a channel which is not guest owned.
> 
>                     Same channel can work when trap+emulation is done.
> 
>                 It is the guest owns the device, it can reset the device, once reset, the device
>                 context are cleared.
> 
>             Hypervisor do not have the ability to read/write the device context. It lost the channel as hypervisor is not involved in trap+emulation.
>             So it is not helpful in one use case.
> 
>             Admin commands can work even with trap+emulation mode.
> 
>             What is missing, that should be added?
> 
>         as explained above, when live migration, the guest should be suspended
>         first, at this point,
>         the host owns the device, it has access to the device.
> 
>     Where do you say this in the spec patch?
> 
> VM live migration is not in this spec.

Then it should be.

> If we suspend the device first, then the guest may detect IO errors.
> 

That's bad. So you need to tell driver what not to do so as not to get
errors.

> 
> 
>                                 This can also be used for debugging I think.
> 
>                             As Michael listed, a dedicated debug interface is usually more
>                             useful instead
> 
>                         of in-band.
>                         re-using another facility without extra efforts is not a bad thing anyway.
> 
>                     I just donât see how a suspend bit some debug feature.
>                     Almost everything with that regard is a debug feature to me.
> 
>                 suspend then check the device states?
> 
>             You already suspended the device, so device state is already changed.
>             All debug information is changed, so not useful now.
> 
>         When suspended, the device should keep and stabilize its device states,
>         at least in my series it should behave like this.
> 
>     That's vague. What does it mean exactly and what happens if
>     some external event causes state change?
> 
> it is suspended, somehow like powered-down, so it should not
> respond to the events until resume.

"somehow" is too vague for the spec.

-- 
MST

Follow-Ups:
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>

References:
- Re: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>