OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE




On 9/12/2023 3:40 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, September 12, 2023 12:58 PM
To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
open.org; virtio-dev@lists.oasis-open.org
Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
VIRTIO_F_QUEUE_STATE



On 9/12/2023 2:47 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, September 12, 2023 12:04 PM


On 9/12/2023 1:58 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, September 12, 2023 9:37 AM

On 9/11/2023 6:21 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
"admin" VF? This require the HW reserve dedicated resource for
every VF?
So expensive, Overkill?

And a VF may be managed by the PF and its admin "vf"?
Yes.
it's a bit chaos, as you can see if the nested(L2 guest) VF can be
managed by both L1 guest VF and the host PF, that means two owners
of the
L2 VF.
This is the nesting.
When you do M level nesting, does any cpu in world handle its own
page
tables in isolation of next level and also perform equally well?
Not exactly, in nesting, L1 guest is the host/infrastructure emulator
for L2, so L2 is expect to do nothing with the host, or something
like L2 VF managed by both
L1 VF and host PF can lead to operational and security issues?
If UDP packets are dropped, even application can fail who do no retry.
UDP is not reliable, and performance overhead does not mean fail.
It largely depends on application.
I have seen iperf UDP failing on packet drop and never recovered.
A retransmission over UDP can fail.
That depends on the workload, if it choose UDP, it is aware of the
possibilities of losing packets. But anyway, LM are expected to
perform successfully in the due time
And LM also depends on the workload. :)
Exactly! That's the point, how to meet the requirements!
It is pointless to discuss performance characteristics as a point to
use AQ or
not.
How to meet QOS requirement when LM?
By following [1] where large part of device context and dirty page tracking is
done when the VM is running.
Still needs to migrate the last round of dirty pages and device states when VM
freeze. Still can be large if take big amount of VMs into consideration, and that
is where ~300ms due time rules.
No. board designer does not need to.
As explained already, if board wants to supporting single command of
AQ,
sure.
Same as above, the QOS question. For example, how to avoid the
situation that half VMs can be migrated and others timeout?
Why would this happen?
Timeout is not related to AQ in case if that happens.
explained above
Timeout can happen to config registers too. And it can be even far more
harder for board designers to support PCI reads in a timeout to handle in 384
reads in parallel.
When the VM freeze, the virtio functionalities, for example virito-net
transaction is suspended as well, so no TLPs for networking traffic buffers.
The config registers mediated operation done by host itself are TLPs flowing for several hundreds of VM example you took.
In your example you took 1000 VMs freezing simultaneously for which you need to finish the config cycles in some 300 msec.
This is per-device operations, directly access device config space, consume the dedicated device resource & bandwidth, like
other standard virito operations.

The on-device Live Migration facility can use the full PCI device bandwidth for
migration.
So does admin commands also.
However the big difference is: registers do not scale with large number of VFs.
Admin commands scale easily.
admin vq require fixed and dedicated resource to serve the VMs, the question still remains, does is scale to server big amount of devices migration? how many admin
vqs do you need to serve 10 VMs, how many for 100? and so on? How to scale?

If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable time?
If not, how many exactly.


And register does not need to scale, it resides on the VF and only serve the VF.

It does not reside on the PF to migrate the VFs.

I probably should not repeat what is already captured in the admin commands commit log and cover letter.

That is the difference with the admin vq.
I donât know what difference you are talking about.
PCI device bandwidth for migration is available with admin commands and some config registers both.
BW != timeout.
VFs config space can use the device dedicated resource like the bandwidth.

for AQ, still you need to reserve resource and how much?

I am still not able to follow your point for asking about unrelated QOS
questions.
explained above, it has to meet the due time requirement and many VMs can
be migrated simultaneously, in that situation, they have to race for the admin
vq resource/bandwidth.
Admin command can even fail with EAGAIN error code when device is
out of
resource and software can retry the command.
As demonstrated, this series is reliable as the config space
functionalities, so maybe less possibilities to fail?
Huh. Config space has far higher failure rate for the PCI transport
when due to
inherent nature of PCI timeouts and reads and polling.
For any bulk data transfer virtqueue is spec defined approach.
For more than a year this was debated you can check some 2021 emails.

You can see the patches that data transfer done in [1] over
registers is snail
slow.
Do you often observe virtio PCI config space fail? Or does admin vq
need to transfer data through PCI?
Admin commands needs to transfer bulk data across thousands of VFs in
parallel for many VFs without baking registers in PCI.
So you agree actually PCI config space are very unlikely to fail? It is reliable.

No. I do not agree. It can fail and very hard for board designers.
AQs are more reliable way to transport bulk data in scalable manner for tens of member devices.
Really? How often do you observe virtio config space fail?

Please allow me to provide an extreme example, is one single admin vq
limitless, that can serve hundreds to thousands of VMs migration?
It is left to the device implementation. Just like RSS and multi queue support?
Is one Q enough for 800Gbps to 10Mbps link?
Answer is: Not the scope of specification, spec provide the framework to scale this way, but not impose on the device.
Even if not support RSS or MQ, the device still can work with performance overhead, not fail.

Insufficient bandwidth & resource caused live migration fail is totally different.

If not, two or
three or what number?
It really does not matter. Its wrong point to discuss here.
Number of queues and command execution depends on the device implementation.
A financial transaction application can timeout when a device queuing delay for virtio net rx queue is long.
And we donât put details about such things in specification.
Spec takes the requirements and provides driver device interface to implement and scale.

I still donât follow the motivation behind the question.
Is your question: How many admin queues are needed to migrate N member devices? If so, it is implementation specific.
It is similar to how such things depend on implementation for 30 virtio device types.

And if are implying that because it is implementation specific, that is why administration queue should not be used, but some configuration register should be used.
Than you should propose a config register interface to post virtqueue descriptors that way for 30 device types!
if so, leave it as undefined? A potential risk for device implantation?
Then why must the admin vq?



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]