virtio-comment message

Subject: RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE

From: Parav Pandit <parav@nvidia.com>
To: "Zhu, Lingshan" <lingshan.zhu@intel.com>, Jason Wang <jasowang@redhat.com>
Date: Tue, 12 Sep 2023 06:47:41 +0000

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:04 PM
> 
> 
> On 9/12/2023 1:58 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 9:37 AM
> >>
> >> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >>>> "admin" VF? This require the HW reserve dedicated resource for
> >>>> every VF?
> >>>> So expensive, Overkill?
> >>>>
> >>>> And a VF may be managed by the PF and its admin "vf"?
> >>> Yes.
> >> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
> >> managed by both L1 guest VF and the host PF, that means two owners of the
> L2 VF.
> > This is the nesting.
> > When you do M level nesting, does any cpu in world handle its own page
> tables in isolation of next level and also perform equally well?
> Not exactly, in nesting, L1 guest is the host/infrastructure emulator for L2, so L2
> is expect to do nothing with the host, or something like L2 VF managed by both
> L1 VF and host PF can lead to operational and security issues?
> >
> >>>>> If UDP packets are dropped, even application can fail who do no retry.
> >>>> UDP is not reliable, and performance overhead does not mean fail.
> >>> It largely depends on application.
> >>> I have seen iperf UDP failing on packet drop and never recovered.
> >>> A retransmission over UDP can fail.
> >> That depends on the workload, if it choose UDP, it is aware of the
> >> possibilities of losing packets. But anyway, LM are expected to
> >> perform successfully in the due time
> > And LM also depends on the workload. :)
> Exactly! That's the point, how to meet the requirements!
> > It is pointless to discuss performance characteristics as a point to use AQ or
> not.
> How to meet QOS requirement when LM?
By following [1] where large part of device context and dirty page tracking is done when the VM is running.

> > No. board designer does not need to.
> > As explained already, if board wants to supporting single command of AQ,
> sure.
> Same as above, the QOS question. For example, how to avoid the situation that
> half VMs can be migrated and others timeout?
Why would this happen?
Timeout is not related to AQ in case if that happens.
Timeout can happen to config registers too. And it can be even far more harder for board designers to support PCI reads in a timeout to handle in 384 reads in parallel.

I am still not able to follow your point for asking about unrelated QOS questions.

> >
> >>> Admin command can even fail with EAGAIN error code when device is
> >>> out of
> >> resource and software can retry the command.
> >> As demonstrated, this series is reliable as the config space
> >> functionalities, so maybe less possibilities to fail?
> > Huh. Config space has far higher failure rate for the PCI transport when due to
> inherent nature of PCI timeouts and reads and polling.
> > For any bulk data transfer virtqueue is spec defined approach.
> > For more than a year this was debated you can check some 2021 emails.
> >
> > You can see the patches that data transfer done in [1] over registers is snail
> slow.
> Do you often observe virtio PCI config space fail? Or does admin vq need to
> transfer data through PCI?
Admin commands needs to transfer bulk data across thousands of VFs in parallel for many VFs without baking registers in PCI.

> >
> >>> They key part is all of these happens outside of the VM's downtime.
> >>> Majority of the work in proposal [1] is done when the VM is _live_.
> >>> Hence, the resource consumption or reservation is significantly less.
> >> Still depends on the volume of VMs and devices, the orchestration
> >> layer needs to migrate the last round of dirty pages and states even
> >> when the VM has been suspended.
> > That has nothing do with admin virtqueue.
> > And migration layer already does it and used by multiple devices.
> same as above, QOS
> >
> >>>
> >>>>>> Naming a number or an algorithm for the ratio of devices /
> >>>>>> num_of_AQs is beyond this topic, but I made my point clear.
> >>>>> Sure. It is beyond.
> >>>>> And it is not a concern either.
> >>>> It is, the user expect the LM process success than fail.
> >>> I still fail to understand why LM process fails.
> >>> The migration process is slow, but downtime is not in [1].
> >> If I recall it clear, the downtime is around 300ms, so don't let the
> >> bandwidth or num of admin vqs become a bottle neck which may
> >> introduce more possibilities to fail.
> >>>>>> can depth = 1K introduce significant latency?
> >>>>> AQ command execution is not done serially. There is enough text on
> >>>>> the AQ
> >>>> chapter as I recall.
> >>>> Then require more HW resource, I don't see difference.
> >>> Difference compared to what, multiple AQs?
> >>> If so, sure.
> >>> The device who prefers to do only one AQ command at a time, sure it
> >>> can
> >> work with less resource and do one at a time.
> >> I think we are discussing the same issue as above "resource for the worst
> case"
> >> problem
> > Frankly I am not seeing any issue.
> > AQ is just another virtqueue as basic construct in the spec used by 30+ device
> types.
> explained above, when migrate a VM, the time consuming has to convergence
> and the total downtime has a due, I remember it is less than 300ms. That is the
> QOS requirement.
And admin commands can easily serve that as majority of the work is done when the VM is running and member device is in active state in proposal [1].

Follow-Ups:
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>

References:
- [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>