virtio-comment message

Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE

From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
To: Parav Pandit <parav@nvidia.com>, Jason Wang <jasowang@redhat.com>
Date: Wed, 13 Sep 2023 12:20:48 +0800



On 9/13/2023 12:12 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, September 13, 2023 9:31 AM

On 9/12/2023 9:43 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, September 12, 2023 6:33 PM

On 9/12/2023 5:21 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed
and dedicated resource to serve the VMs, the question still
remains, does is scale to server big amount of devices migration?
how many admin vqs do you need to serve 10 VMs, how many for 100?
and so on? How to scale?

Yes, it scales within the AQ and across multiple AQs.
Please consult your board designers to know such limits for your device.

scales require multiple AQs, then how many should a vendor provide
for the worst case?

I am boring for the same repeating questions.

I said it scales, within the AQ. (and across AQs).
I have answered enough times, so I will stop on same repeated question.
Your repeated question is not helping anyone as it is not in the scope of virtio.

If you think it is, please get it written first for RSS and MQ in net section and

post for review.
You missed the point of the question and I agree no need to discuss this
anymore.

Ok. thanks.

If one admin vq can serve 100 VMs, can it migrate 1000VMs in
reasonable

time?

If not, how many exactly.

Yes, it can serve both 100 and 1000 VMs in reasonable time.

I am not sure, the aq is limitless? Can serve thousands of VMs in a
reasonable time? Like in 300ms?

Yes.

really? limitless?

I answered yes for " Can serve thousands of VMs in reasonable time? Like in 300ms?"?
VQ depth defines the VQ's limit.

still sounds like limitless and I will stop arguing this as you can seeif there is REALLY

a queue can be limitless, we even don't need Multi-queue or RSS.

If you say, that require multiple AQ, then how many should a vendor

provide?

I didnât say multiple AQs must be used.
It is same as NIC RQs.

don't you agree a single vq has its own performance limitations?

For LM I donât see the limitation.
The finite limit an AQ has, such limitation is no different than some register write poll with one entry at a time per device.

see above, and we are implementing per device facilities.

In this series, it says:
+When setting SUSPEND, the driver MUST re-read \field{device status} to
ensure the SUSPEND bit is set.

And this is nothing to do with scale.

Hence, it is bringing same scale QOS limitation on register too that you claim may be present in the AQ.

And hence, I responded earlier that when most things are not done through BAR, so there is no need to do suspend/resume via BAR either.
And hence the mode setting command of [1] is just fine.

The bar registers are almost "triggers"

On top of that once the device is SUSPENDED, it cannot accept some other

RESET_VQ command.
so as SiWei suggested, there will be a new feature bit introduced in V2
for vq reset.

VQ cannot be RESET after the device reset as you wrote.

It is device SUSPEND, not reset.

It does not reside on the PF to migrate the VFs.

Hence it does not scale and cannot do parallel operation within the VF,

unless

each register is replicated.
Why its not scale? It is a per device facility.

Because the device needs to answer per device through some large scale

memory to fit in a response time.
Again, it is a per-device facility, and it is register based serve the
only one device itself.
And we do not plan to log the dirty pages in bar.

Hence, there is no reason to wrap suspend resume on the BAR either.
The mode setting admin command is just fine.

They are device status bits.

Why do you need parallel operation against the LM facility?

Because your downtime was 300msec for 1000 VMs.

the LM facility in this series is per-device, it only severs itself.

And that single threading and single threading per VQ reset via single register wont scale.

it is per-device facility, for example, on the VF, not the owner PF.

That doesn't make a lot of sense.

Using register of a queue for bulk data transfer is solved question when the

virtio spec was born.

I donât see a point to discuss it.
Snippet from spec: " As a device can have zero or more virtqueues for bulk

data transport"
Where do you see the series intends to transfer bulk data through registers?

VFs config space can use the device dedicated resource like the

bandwidth.

for AQ, still you need to reserve resource and how much?

It depends on your board, please consult your board designer to know

depending on the implementation.

   From spec point of view, it should not be same as any other virtqueue.

so the vendor own the risk to implement AQ LM? Why they have to?

No. I do not agree. It can fail and very hard for board designers.
AQs are more reliable way to transport bulk data in scalable manner
for tens

of member devices.
Really? How often do you observe virtio config space fail?

On Intel Icelake server we have seen it failing with 128 VFs.
And device needs to do very weird things to support 1000+ VFs forever

expanding config space, which is not the topic of this discussion anyway.
That is your setup problem.

Please allow me to provide an extreme example, is one single admin
vq limitless, that can serve hundreds to thousands of VMs migration?

It is left to the device implementation. Just like RSS and multi queue

support?

Is one Q enough for 800Gbps to 10Mbps link?
Answer is: Not the scope of specification, spec provide the
framework to scale

this way, but not impose on the device.
Even if not support RSS or MQ, the device still can work with
performance overhead, not fail.

_work_ is subjective.
The financial transaction (application) failed. Packeted worked.
LM commands were successful, but it was not timely.

Same same..

Insufficient bandwidth & resource caused live migration fail is
totally different.

Very abstract point and unrelated to administration commands.

It is your design facing the problem.

If not, two or
three or what number?

It really does not matter. Its wrong point to discuss here.
Number of queues and command execution depends on the device

implementation.

A financial transaction application can timeout when a device
queuing delay

for virtio net rx queue is long.

And we donât put details about such things in specification.
Spec takes the requirements and provides driver device interface to

implement and scale.

I still donât follow the motivation behind the question.
Is your question: How many admin queues are needed to migrate N
member

devices? If so, it is implementation specific.

It is similar to how such things depend on implementation for 30
virtio device

types.

And if are implying that because it is implementation specific, that
is why

administration queue should not be used, but some configuration
register should be used.

Than you should propose a config register interface to post
virtqueue

descriptors that way for 30 device types!
if so, leave it as undefined? A potential risk for device implantation?
Then why must the admin vq?

Because administration commands and admin vq does not impose devices

to

implement thousands of registers which must have time bound completion
guarantee.

The large part of industry including SIOV devices led by Intel and others are

moving away from register access mode.

To summarize, administration commands and queue offer following

benefits.

1. Ability to do bulk data transfer between driver and device

2. Ability to parallelize the work within driver and within device
within single or multiple virtqueues

3. Eliminates implementing PCI read/write MMIO registers which demand
low latency response interval

4. Better utilize host cpu as no one needs to poll on the device
register for completion

5. Ability to handle variability in command completion by device and
ability to notify the driver

If this does not satisfy you, please refer to some of the past email

discussions

during administration virtuqueue time.
I think you mixed up the facility and the implementation in my series, please
read.

I donât know what you refer to. You asked "why AQ is must?" I answered

above what AQ has to offer than some synchronous register.
Again, we are implementing facilities, V2 will include inflgiht
descriptors and dirty page tracking. That works for LM.

It can be named under anything, what matters is how/where it is used?
So "facility" and "implementation" in your above comment are just abstract word.
I answered you "Why AQ is must"?

see above and please feel free to reuse the basic facilities if you likein your AQ LM

Follow-Ups:
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>

References:
- [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>