OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE




On 11/16/2023 6:21 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Thursday, November 16, 2023 3:45 PM

On 11/16/2023 1:35 AM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 13, 2023 2:56 PM



On 11/10/2023 8:31 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Friday, November 10, 2023 1:22 PM


On 11/9/2023 6:25 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Thursday, November 9, 2023 3:39 PM


On 11/9/2023 2:28 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Tuesday, November 7, 2023 3:02 PM

On 11/6/2023 6:52 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 6, 2023 2:57 PM

On 11/6/2023 12:12 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 6, 2023 9:01 AM

On 11/3/2023 11:50 PM, Parav Pandit wrote:
From: virtio-comment@lists.oasis-open.org
<virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
Lingshan
Sent: Friday, November 3, 2023 8:27 PM

On 11/3/2023 7:35 PM, Parav Pandit wrote:
From: Zhu Lingshan <lingshan.zhu@intel.com>
Sent: Friday, November 3, 2023 4:05 PM

This patch adds two new le16 fields to common
configuration structure to support VIRTIO_F_QUEUE_STATE
in PCI transport
layer.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
          transport-pci.tex | 18 ++++++++++++++++++
          1 file changed, 18 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index
a5c6719..3161519 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -325,6 +325,10 @@ \subsubsection{Common
configuration
structure
layout}\label{sec:Virtio Transport
                  /* About the administration virtqueue. */
                  le16 admin_queue_index;         /* read-only for
driver
*/
                  le16 admin_queue_num;         /* read-only for
driver
*/
+
+	/* Virtqueue state */
+        le16 queue_avail_state;         /* read-write */
+        le16 queue_used_state;          /* read-write */
This tiny interface for 128 virtio net queues through
register read writes, does
not work effectively.
There are inflight out of order descriptors for block also.
Hence toy registers like this do not work.
Do you know there is a queue_select? Why this does not
work?
Do you know how other queue related fields work?
:)
Yes. If you notice queue_reset related critical spec bug
fix was done when it
was introduced so that live migration can _actually_ work.
When queue_select is done for 128 queues serially, it take
a lot of time to
read those slow register interface for this + inflight
descriptors +
more.
interesting, virtio work in this pattern for many years, right?
All these years 400Gbps and 800Gbps virtio was not present,
number of
queues were not in hw.
The registers are control path in config space, how 400G or
800G
affect??
Because those are the one in practice requires large number of VQs.

You are asking per VQ register commands to modify things
dynamically via
this one vq at a time, serializing all the operations.
It does not scale well with high q count.
This is not dynamically, it only happens when SUSPEND and RESUME.
This is the same mechanism how virtio initialize a virtqueue,
working for many years.
No. when virtio driver initializes it for the first time, there
is no active traffic
that gets lost.
This is because the interface is not yet up and not part of the
network
yet.
The resume must be fast enough, because the remote node is
sending
packets.
Hence it is different from driver init time queue enable.
I am not sure any packets arrive before a link announce at the
destination
side.
I think it can.
Because there is no notification of member device link down
intimation to
remote side.
The L4 and L5 protocols have no knowledge that node which they are
interacting is behind some layers of switches.
So keeping this time low is desired.
The NIC should broad cast itself first, so that other peers in the
network know(for example its mac to route it) how to send a message to
it.
This is necessary, for example VIRTIO_NET_F_GUEST_ANNOUNCE, similar
mechanism work for in-marketing productions for years.

This is out of the topic anyway.
See the virtio common cfg, you will find the max number of
vqs is there, num_queues.
:)
Sure. those values at high q count affects.
the driver need to initialize them anyway.
That is before the traffic starts from remote end.
see above, that needs a link announce and this is after
re-initialization
Device didnât support LM.
Many limitations existed all these years and TC is improving
and expanding
them.
So all these years do not matter.
Not sure what are you talking about, haven't we initialize
the device and vqs in config space for years?????? What's
wrong with this
mechanism?
Are you questioning virito-pci fundamentals???
Donât point to in-efficient past to establish similar in-efficient future.
interesting, you know this is a one-time thing, right?
and you are aware of this has been there for years.
Like how to set a queue size and enable it?
Those are meant to be used before DRIVER_OK stage as they
are init time
registers.
Not to keep abusing them..
don't you need to set queue_size at the destination side?
No.
But the src/dst does not matter.
Queue_size to be set before DRIVER_OK like rest of the
registers, as all
queues must be created before the driver_ok phase.
Queue_reset was last moment exception.
create a queue? Nvidia specific?

Huh. No.
Do git log and realize what happened with queue_reset.
You didn't answer the question, does the spec even has defined
"create a
vq"?
Enabled/created = tomato/tomato when discussing the spec in
non-normative
email conversation.
It's irrelevant.
Then lets not debate on this enable a vq or create a vq anymore
All I am saying is, when we know the limitations of the
transport and when industry is forwarding to not introduced more
and more on-die register
for once in lifetime work of device migration, we just use the
optimal command and queue interface that is native to virtio.
PCI config space has its own limitations, and admin vq has its
advantages, but that does not apply to all use cases.

There was a recent work done emulating the SR-IOV cap and allowing
VM to
enable SR-IOV in [1].
This is the option I mentioned few weeks ago.

So with admin commands and admin virtqueues, even nested model
will work
using [1].
[1]
https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov-offlo
ad
-o
n-virtual-machines.html
We should take this into consideration once it is standardized in
the spec, maybe not now, there can always be many workarounds to
solve one
problem.
Sure, until that point the admin commands are able to suffice the need
well.
And when the spec changes in transport occurs (if needed), current
admin
command and admin vq also fits very well that will follow above [1].
we have pointed lots of problems for admin vq based live migration
proposal, I won't repeat them here
I donât see any.
Nested is already solved using above.
I don't see how, do you mind to work out the patches?
Once the base series is completed, nested cases can be addressed.
I wont be able to work on the patches for it until we finish for the first level virtualization.
As you know, nested is supported well in current virtio, so please don't break it.

Long time ago, you mentioned some QoS issue, which anyway exists in the
device register method too.
Can you please list them if anything other than QoS and nest?



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]