virtio-comment message

Subject: RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE

From: Parav Pandit <parav@nvidia.com>
To: "Zhu, Lingshan" <lingshan.zhu@intel.com>, Jason Wang <jasowang@redhat.com>
Date: Mon, 11 Sep 2023 07:30:19 +0000

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 12:48 PM
> 
> On 9/11/2023 3:07 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 12:28 PM
> >>> I donât see in his proposal how all the features and functionality
> >>> supported is
> >> achieved.
> >> I will include in-flight descriptor tracker and diry-page traking in
> >> V2, anything else missed?
> >> It can migrate the device itself, why don't you think so, can you
> >> name some issues we can work on for improvements?
> > I would like to see a proposal similar to [1] that can work without mediation
> in case if you want to combine two use cases under one.
> > Else, I donât see a need to merge two things.
> >
> > Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered
> in [1] for passthrough cases.
> We are introducing basic facilities, feel free to re-use them in the admin vq
> solution.
Basic facilities are added in [1] for passthrough devices.
You can leverage them in your v2 for supporting p2p devices, dirty page tracking, passthrough support, shorter downtime and more.

> >
> >> If you want to implement LM by admin vq, the facilities in my series
> >> can be re- used. E.g., forward your suspend to SUSPEND bit.
> > Just VQ suspend is not enough...
> In this series, it contains: device SUSPEND, queue state accessor.
> MST required in-flight descriptor tracking, which will be included in next
> version.
For passthrough more than that is needed.
> >
> >>>
> >>>>> Admin queue of the member device is migrated like any other queue
> >>>>> using
> >>>> above [1].
> >>>>>> 2) won't work in the nested environment, or we need complicated
> >>>>>> SR-IOV emulation in order to work
> >>>>>>
> >>>>>>> Poking at the device from the driver to migrate it is not going
> >>>>>>> to work if the driver lives within guest.
> >>>>>> This is by design to allow live migration to work in the nested layer.
> >>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
> >>>>>> different here?
> >>>>> Nested and non-nested use cases likely cannot be addressed by
> >>>>> single
> >>>> solution/interface.
> >>>>
> >>>> I think Ling Shan's proposal addressed them both.
> >>>>
> >>> I donât see how all above points are covered.
> >> Why?
> >>
> >>
> >> And how do you migrate nested VMs by admin vq?
> >>
> > Hypervisor = level 1.
> > VM = level 2.
> > Nested VM = level 3.
> > VM of level 2 to take care of migrating level 3 composed device using its sw
> composition or may be using some kind of mediation that you proposed.
> So, nested VM is not aware of the admin vq or does not have access to admin
> vq, right?
Right. It is not aware.

> >

> >> How many admin vqs and the bandwidth are reserved for migrate all VMs?
> >>
> > It does not matter because number of AQs is configurable that device and
> driver can decide to use.
> > I am not sure which BW are talking about.
> > There are many BW in place that one can regulate, at network level, pci level,
> VM level etc.
> It matters because of QOS and the downtime must converge.
QOS is such a broad term that is hard to debate unless you get to a specific point.
> 
> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the number
> in HW implementation and how does the driver get informed?
Usually just one AQ is enough as proposal [1] is built around inherent downtime reduction.
You can ask similar question for RSS, how does hw device how many RSS queues are needed. ð
Device exposes number of supported AQs that driver is free to use.

Most sane sys admins do not migrate 1000 VMs at same time for obvious reasons.
But when such requirements arise, a device may support it.
Just like how a net device can support from 1 to 32K txqueues at spec level.

> >
> >> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> > I am not sure why the migration reason has any influence on the design.
> Because this design is for live migration.
> >
> > The CSPs that we had discussed, care for performance more and hence
> prefers passthrough instead or mediation and donât seem to be doing any
> nesting.
> > CPU doesnt have support for 3 level of page table nesting either.
> > I agree that there could be other users who care for nested functionality.
> >
> > Any ways, nesting and non-nesting are two different requirements.
> The LM facility should server both, 
I donât see how PCI spec let you do it.
PCI community already handed over this to SR-PCIM interface outside of the PCI spec domain.
Hence, its done over admin queue for passthrough devices.

If you can explain, how your proposal addresses passthrough support without mediation and also does DMA, I am very interested to learn that.

> And it does not serve bare-metal live migration either.
A bare-metal migration seems a distance theory as one need side cpu, memory accessor apart from device accessor.
But somehow if that exists, there will be similar admin device to migrate it may be TDDISP will own this whole piece one day.

Follow-Ups:
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>

References:
- [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>