virtio-comment message

Subject: Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND

From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Zhu, Lingshan" <lingshan.zhu@intel.com>
Date: Wed, 22 Nov 2023 02:30:53 -0500

On Wed, Nov 22, 2023 at 09:41:10AM +0800, Zhu, Lingshan wrote:
> > > I mean enable 100 queues cost more time then enable 1 vq no matter
> > > how we enable it. that is O(N)
> > Depends on what "that" is. Number of VM exits does not have to be O(N),
> > you can pass these 100 queues in memory.
> For batching, yest. But I don't see this as a problem because we enable
> vqs by this way for many years, so far so good.

Well boot time is less contrained than migration time
because it often involves a ton of IO to load guest software.



> > 
> > 
> > > 
> > > 
> > >                          This should be a basic facility.
> > > 
> > >                      Other transport can also offer like PCI.
> > > 
> > >                  Do you want to work for these transport? Implementing the new features as
> > >                  PCI?
> > > 
> > >              Not presently as PCI as more features than rest of the two.
> > >              What I read about ccw is: " S/390 based virtual machines support neither PCI nor MMIO".
> > > 
> > >              And I also read, "The IBM System/390 is a discontinued mainframe product family implementing".
> > > 
> > >              So I donât know who needs to extend ccw.
> > >              And if one needs, those maintainers will extend it to match to PCI standard.
> > > 
> > >          So these features are even not planned, so don't depend on them.
> > > 
> > >      But again can one suspend ccw device? If you are adding this feature and
> > >      claiming it's supported for all transports you better find out
> > >      what does it do.
> > > 
> > > I am not an expert on CCW, anything block we suspend a CCW device by this bit?
> > I don't think CCW supports suspend at all.
> I think it is not a transport feature but a device feature,
> the device can always suspend it self, like don't process data
> and stop responding until a specific signal.

If guest keeps going but device just stop then guest has
a decent chance to crash.

> > 
> > > This seems only controlled by the device itself.
> > > 
> > And? What it the point of suspending only the device if rest of system
> > is still going?
> That is an orchestration issue, totally up to the administrators.
> Normally when suspending the device, the guest are very likely
> to be suspended already.

If you need an orchestration system to suspend your laptop or your
phone things are very bad indeed.

> > 
> > > 
> > >                              In that case if there is suspend the device available, it will be
> > >                              used by the
> > > 
> > >                          guest driver itself, hypervisor wouldnât know about it when those
> > >                          registers are not trapped.
> > > 
> > >                              So we need two ways to suspend.
> > >                              One is guest visible, and guest controlled.
> > >                              Second is hypervisor control to fulfill the device migration needs.
> > > 
> > >                          The guest can eve reset the device.
> > > 
> > >                              So if you can please take a look if the proposed admin command to
> > > 
> > >                          freeze/stop mode can be used in the emulated register case or not.
> > > 
> > >                              It helps to have the suspend bit in guest control as well
> > >                              with/without
> > > 
> > >                          emulation mode.
> > >                          Parav, please believe I have read your series, I didn't comment there
> > >                          because I want to avoid further conflicts/debating, we have done these
> > > 
> > >                  enough.
> > > 
> > >                      I believe the series posted in v3 can support vdpa use case as well.
> > >                      So I will progress to post v4.
> > > 
> > > 
> > >                          As explained before, freeze/stop the device by PCI is a layer violation.
> > > 
> > >                      I am afraid, we have different vision.
> > >                      I donât see any layer violation.
> > >                      Suspend is enough in the PCI PM.
> > >                      Our vision is more aligned with rest of the hypervisor knobs that owns the
> > > 
> > >                  migration framework.
> > >                  I think I have explained, virito builds on other transport and it should be self-
> > >                  contained, so far so good.
> > > 
> > >              Virtio without any transport binding is just blank paper discussion.
> > > 
> > >          virtio is built on some transports, but not bind to any.
> > > 
> > >      Binding is an OS specific thing, but e.g. under Linux transport drivers bind to
> > >      devices then virtio drivers bind to virtio bus. No binding -> nothing
> > >      works.
> > > 
> > > I think general facilities are better not only work on a specific transport
> > > 
> > But platform facilities are even better we don't need to work on them at
> > all.
> Yes, so I also agree to track dirty pages by the platform, on-CPU dirty page
> tracking facilities serving all transport, not only PCI.

If available. Both you and Parav should stop just repeating your
preference and start showing some actual info on which systems
support this platform tracking, how recent they have to be,
are all of server/desktop/mobile covered, etc.

> > 
> > 
> > >                          And device status can be pass-through(without emulation, just map it
> > >                          to
> > >                          guest) to the guest or trapped(trap and emulate by the hypervisor,
> > >                          for example set_status in vDPA).
> > > 
> > >                      When it is pass-through, it is controlled by the guest, so for example, if the
> > > 
> > >                  guest resets the device, hypervisor has lost the control of migration context etc.
> > > 
> > >                      Hence, hypervisor needs a channel which is not guest owned.
> > > 
> > >                      Same channel can work when trap+emulation is done.
> > > 
> > >                  It is the guest owns the device, it can reset the device, once reset, the device
> > >                  context are cleared.
> > > 
> > >              Hypervisor do not have the ability to read/write the device context. It lost the channel as hypervisor is not involved in trap+emulation.
> > >              So it is not helpful in one use case.
> > > 
> > >              Admin commands can work even with trap+emulation mode.
> > > 
> > >              What is missing, that should be added?
> > > 
> > >          as explained above, when live migration, the guest should be suspended
> > >          first, at this point,
> > >          the host owns the device, it has access to the device.
> > > 
> > >      Where do you say this in the spec patch?
> > > 
> > > VM live migration is not in this spec.
> > Then it should be.
> > 
> > > If we suspend the device first, then the guest may detect IO errors.
> > > 
> > That's bad. So you need to tell driver what not to do so as not to get
> > errors.
> I think the process should be suspending the guest first, then the host
> owns the device, so it can suspend the guest and collect the necessary data
> for live migration.

So you need to say what is and what is not allowed.
Because in your model, device has no idea whether it's
host or guest accessing it and so you can not just say
"don't access device at all". And this by the way is
a big plus in Parav's approach - since host uses a
distinct channel for state retrieval it is possible
to simply say "don't access device" and we are done.

> > 
> > > 
> > >                                  This can also be used for debugging I think.
> > > 
> > >                              As Michael listed, a dedicated debug interface is usually more
> > >                              useful instead
> > > 
> > >                          of in-band.
> > >                          re-using another facility without extra efforts is not a bad thing anyway.
> > > 
> > >                      I just donât see how a suspend bit some debug feature.
> > >                      Almost everything with that regard is a debug feature to me.
> > > 
> > >                  suspend then check the device states?
> > > 
> > >              You already suspended the device, so device state is already changed.
> > >              All debug information is changed, so not useful now.
> > > 
> > >          When suspended, the device should keep and stabilize its device states,
> > >          at least in my series it should behave like this.
> > > 
> > >      That's vague. What does it mean exactly and what happens if
> > >      some external event causes state change?
> > > 
> > > it is suspended, somehow like powered-down, so it should not
> > > respond to the events until resume.
> > "somehow" is too vague for the spec.
> Yeah, in spec, we have a section to describe what the device should do when
> SUSPEND.
> > 

maybe I missed it. generally I think the whole SUSPEND feature
should be in one patch, no real reason to split it up.
queue state thing is a different matter.

-- 
MST

References:
- RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] RE: [PATCH V2 3/6] virtio: dont reset vqs when SUSPEND
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>