virtio-comment message

Subject: Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE

From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
To: Parav Pandit <parav@nvidia.com>, "jasowang@redhat.com" <jasowang@redhat.com>, "mst@redhat.com" <mst@redhat.com>, "eperezma@redhat.com" <eperezma@redhat.com>, "cohuck@redhat.com" <cohuck@redhat.com>, "stefanha@redhat.com" <stefanha@redhat.com>
Date: Fri, 17 Nov 2023 18:02:14 +0800



On 11/16/2023 6:21 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Thursday, November 16, 2023 3:45 PM

On 11/16/2023 1:35 AM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 13, 2023 2:56 PM



On 11/10/2023 8:31 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Friday, November 10, 2023 1:22 PM


On 11/9/2023 6:25 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Thursday, November 9, 2023 3:39 PM


On 11/9/2023 2:28 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, November 7, 2023 3:02 PM

On 11/6/2023 6:52 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 6, 2023 2:57 PM

On 11/6/2023 12:12 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 6, 2023 9:01 AM

On 11/3/2023 11:50 PM, Parav Pandit wrote:

From: virtio-comment@lists.oasis-open.org
<virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
Lingshan
Sent: Friday, November 3, 2023 8:27 PM

On 11/3/2023 7:35 PM, Parav Pandit wrote:

From: Zhu Lingshan <lingshan.zhu@intel.com>
Sent: Friday, November 3, 2023 4:05 PM

This patch adds two new le16 fields to common
configuration structure to support VIRTIO_F_QUEUE_STATE
in PCI transport

layer.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
          transport-pci.tex | 18 ++++++++++++++++++
          1 file changed, 18 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index
a5c6719..3161519 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -325,6 +325,10 @@ \subsubsection{Common

configuration

structure

layout}\label{sec:Virtio Transport
                  /* About the administration virtqueue. */
                  le16 admin_queue_index;         /* read-only for

driver

*/

                  le16 admin_queue_num;         /* read-only for

driver

*/

+
+	/* Virtqueue state */
+        le16 queue_avail_state;         /* read-write */
+        le16 queue_used_state;          /* read-write */

This tiny interface for 128 virtio net queues through
register read writes, does

not work effectively.

There are inflight out of order descriptors for block also.
Hence toy registers like this do not work.

Do you know there is a queue_select? Why this does not

work?

Do you know how other queue related fields work?

:)
Yes. If you notice queue_reset related critical spec bug
fix was done when it

was introduced so that live migration can _actually_ work.

When queue_select is done for 128 queues serially, it take
a lot of time to

read those slow register interface for this + inflight
descriptors +

more.

interesting, virtio work in this pattern for many years, right?

All these years 400Gbps and 800Gbps virtio was not present,
number of

queues were not in hw.
The registers are control path in config space, how 400G or
800G

affect??

Because those are the one in practice requires large number of VQs.

You are asking per VQ register commands to modify things
dynamically via

this one vq at a time, serializing all the operations.

It does not scale well with high q count.

This is not dynamically, it only happens when SUSPEND and RESUME.
This is the same mechanism how virtio initialize a virtqueue,
working for many years.

No. when virtio driver initializes it for the first time, there
is no active traffic

that gets lost.

This is because the interface is not yet up and not part of the
network

yet.

The resume must be fast enough, because the remote node is
sending

packets.

Hence it is different from driver init time queue enable.

I am not sure any packets arrive before a link announce at the
destination

side.

I think it can.
Because there is no notification of member device link down
intimation to

remote side.

The L4 and L5 protocols have no knowledge that node which they are

interacting is behind some layers of switches.

So keeping this time low is desired.

The NIC should broad cast itself first, so that other peers in the
network know(for example its mac to route it) how to send a message to

it.

This is necessary, for example VIRTIO_NET_F_GUEST_ANNOUNCE, similar
mechanism work for in-marketing productions for years.

This is out of the topic anyway.

See the virtio common cfg, you will find the max number of
vqs is there, num_queues.

:)
Sure. those values at high q count affects.

the driver need to initialize them anyway.

That is before the traffic starts from remote end.

see above, that needs a link announce and this is after
re-initialization

Device didnât support LM.
Many limitations existed all these years and TC is improving
and expanding

them.

So all these years do not matter.

Not sure what are you talking about, haven't we initialize
the device and vqs in config space for years?????? What's
wrong with this

mechanism?

Are you questioning virito-pci fundamentals???

Donât point to in-efficient past to establish similar in-efficient future.

interesting, you know this is a one-time thing, right?
and you are aware of this has been there for years.

Like how to set a queue size and enable it?

Those are meant to be used before DRIVER_OK stage as they
are init time

registers.

Not to keep abusing them..

don't you need to set queue_size at the destination side?

No.
But the src/dst does not matter.
Queue_size to be set before DRIVER_OK like rest of the
registers, as all

queues must be created before the driver_ok phase.

Queue_reset was last moment exception.

create a queue? Nvidia specific?

Huh. No.
Do git log and realize what happened with queue_reset.

You didn't answer the question, does the spec even has defined
"create a

vq"?

Enabled/created = tomato/tomato when discussing the spec in
non-normative

email conversation.

It's irrelevant.

Then lets not debate on this enable a vq or create a vq anymore

All I am saying is, when we know the limitations of the
transport and when industry is forwarding to not introduced more
and more on-die register

for once in lifetime work of device migration, we just use the
optimal command and queue interface that is native to virtio.
PCI config space has its own limitations, and admin vq has its
advantages, but that does not apply to all use cases.

There was a recent work done emulating the SR-IOV cap and allowing
VM to

enable SR-IOV in [1].

This is the option I mentioned few weeks ago.

So with admin commands and admin virtqueues, even nested model
will work

using [1].

[1]
https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov-offlo
ad
-o
n-virtual-machines.html

We should take this into consideration once it is standardized in
the spec, maybe not now, there can always be many workarounds to
solve one

problem.

Sure, until that point the admin commands are able to suffice the need

well.

And when the spec changes in transport occurs (if needed), current
admin

command and admin vq also fits very well that will follow above [1].
we have pointed lots of problems for admin vq based live migration
proposal, I won't repeat them here

I donât see any.
Nested is already solved using above.

I don't see how, do you mind to work out the patches?

Once the base series is completed, nested cases can be addressed.
I wont be able to work on the patches for it until we finish for the first level virtualization.

As you know, nested is supported well in current virtio, so please don'tbreak it.

Long time ago, you mentioned some QoS issue, which anyway exists in the

device register method too.

Can you please list them if anything other than QoS and nest?

Follow-Ups:
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>

References:
- [PATCH V2 0/6] introduce basic facilities for virito live migration
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>