virtio-comment message

Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE

From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
To: Parav Pandit <parav@nvidia.com>, Jason Wang <jasowang@redhat.com>
Date: Tue, 12 Sep 2023 15:27:41 +0800



On 9/12/2023 2:47 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, September 12, 2023 12:04 PM


On 9/12/2023 1:58 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, September 12, 2023 9:37 AM

On 9/11/2023 6:21 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
"admin" VF? This require the HW reserve dedicated resource for
every VF?
So expensive, Overkill?

And a VF may be managed by the PF and its admin "vf"?

Yes.

it's a bit chaos, as you can see if the nested(L2 guest) VF can be
managed by both L1 guest VF and the host PF, that means two owners of the

L2 VF.

This is the nesting.
When you do M level nesting, does any cpu in world handle its own page

tables in isolation of next level and also perform equally well?
Not exactly, in nesting, L1 guest is the host/infrastructure emulator for L2, so L2
is expect to do nothing with the host, or something like L2 VF managed by both
L1 VF and host PF can lead to operational and security issues?

If UDP packets are dropped, even application can fail who do no retry.

UDP is not reliable, and performance overhead does not mean fail.

It largely depends on application.
I have seen iperf UDP failing on packet drop and never recovered.
A retransmission over UDP can fail.

That depends on the workload, if it choose UDP, it is aware of the
possibilities of losing packets. But anyway, LM are expected to
perform successfully in the due time

And LM also depends on the workload. :)

Exactly! That's the point, how to meet the requirements!

It is pointless to discuss performance characteristics as a point to use AQ or

not.
How to meet QOS requirement when LM?

By following [1] where large part of device context and dirty page tracking is done when the VM is running.

Still needs to migrate the last round of dirty pages and device stateswhen VM freeze. Still can be large iftake big amount of VMs into consideration, and that is where ~300ms duetime rules.

No. board designer does not need to.
As explained already, if board wants to supporting single command of AQ,

sure.
Same as above, the QOS question. For example, how to avoid the situation that
half VMs can be migrated and others timeout?

Why would this happen?
Timeout is not related to AQ in case if that happens.

explained above

Timeout can happen to config registers too. And it can be even far more harder for board designers to support PCI reads in a timeout to handle in 384 reads in parallel.

When the VM freeze, the virtio functionalities, for example virito-nettransaction is suspended as well,

so no TLPs for networking traffic buffers.

The on-device Live Migration facility can use the full PCI devicebandwidth for migration.


That is the difference with the admin vq.


I am still not able to follow your point for asking about unrelated QOS questions.

explained above, it has to meet the due time requirement and many VMscan be migrated simultaneously,

in that situation, they have to race for the admin vq resource/bandwidth.

Admin command can even fail with EAGAIN error code when device is
out of

resource and software can retry the command.
As demonstrated, this series is reliable as the config space
functionalities, so maybe less possibilities to fail?

Huh. Config space has far higher failure rate for the PCI transport when due to

inherent nature of PCI timeouts and reads and polling.

For any bulk data transfer virtqueue is spec defined approach.
For more than a year this was debated you can check some 2021 emails.

You can see the patches that data transfer done in [1] over registers is snail

slow.
Do you often observe virtio PCI config space fail? Or does admin vq need to
transfer data through PCI?

Admin commands needs to transfer bulk data across thousands of VFs in parallel for many VFs without baking registers in PCI.

So you agree actually PCI config space are very unlikely to fail? It isreliable.

Please allow me to provide an extreme example, is one single admin vqlimitless, that canserve hundreds to thousands of VMs migration? If not, two or three orwhat number?

They key part is all of these happens outside of the VM's downtime.
Majority of the work in proposal [1] is done when the VM is _live_.
Hence, the resource consumption or reservation is significantly less.

Still depends on the volume of VMs and devices, the orchestration
layer needs to migrate the last round of dirty pages and states even
when the VM has been suspended.

That has nothing do with admin virtqueue.
And migration layer already does it and used by multiple devices.

same as above, QOS

Naming a number or an algorithm for the ratio of devices /
num_of_AQs is beyond this topic, but I made my point clear.

Sure. It is beyond.
And it is not a concern either.

It is, the user expect the LM process success than fail.

I still fail to understand why LM process fails.
The migration process is slow, but downtime is not in [1].

If I recall it clear, the downtime is around 300ms, so don't let the
bandwidth or num of admin vqs become a bottle neck which may
introduce more possibilities to fail.

can depth = 1K introduce significant latency?

AQ command execution is not done serially. There is enough text on
the AQ

chapter as I recall.
Then require more HW resource, I don't see difference.

Difference compared to what, multiple AQs?
If so, sure.
The device who prefers to do only one AQ command at a time, sure it
can

work with less resource and do one at a time.
I think we are discussing the same issue as above "resource for the worst

case"

problem

Frankly I am not seeing any issue.
AQ is just another virtqueue as basic construct in the spec used by 30+ device

types.
explained above, when migrate a VM, the time consuming has to convergence
and the total downtime has a due, I remember it is less than 300ms. That is the
QOS requirement.

And admin commands can easily serve that as majority of the work is done when the VM is running and member device is in active state in proposal [1].

explained above, depends on the amount of the migrating VMs.

Follow-Ups:
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>

References:
- [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>