[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
On 11/16/2023 6:21 PM, Parav Pandit wrote:
As you know, nested is supported well in current virtio, so please don't break it.From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Thursday, November 16, 2023 3:45 PM On 11/16/2023 1:35 AM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Monday, November 13, 2023 2:56 PM On 11/10/2023 8:31 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Friday, November 10, 2023 1:22 PM On 11/9/2023 6:25 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Thursday, November 9, 2023 3:39 PM On 11/9/2023 2:28 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Tuesday, November 7, 2023 3:02 PM On 11/6/2023 6:52 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Monday, November 6, 2023 2:57 PM On 11/6/2023 12:12 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Monday, November 6, 2023 9:01 AM On 11/3/2023 11:50 PM, Parav Pandit wrote:From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan Sent: Friday, November 3, 2023 8:27 PM On 11/3/2023 7:35 PM, Parav Pandit wrote:From: Zhu Lingshan <lingshan.zhu@intel.com> Sent: Friday, November 3, 2023 4:05 PM This patch adds two new le16 fields to common configuration structure to support VIRTIO_F_QUEUE_STATE in PCI transportlayer.Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> --- transport-pci.tex | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/transport-pci.tex b/transport-pci.tex index a5c6719..3161519 100644 --- a/transport-pci.tex +++ b/transport-pci.tex @@ -325,6 +325,10 @@ \subsubsection{Commonconfigurationstructurelayout}\label{sec:Virtio Transport /* About the administration virtqueue. */ le16 admin_queue_index; /* read-only fordriver*/le16 admin_queue_num; /* read-only fordriver*/+ + /* Virtqueue state */ + le16 queue_avail_state; /* read-write */ + le16 queue_used_state; /* read-write */This tiny interface for 128 virtio net queues through register read writes, doesnot work effectively.There are inflight out of order descriptors for block also. Hence toy registers like this do not work.Do you know there is a queue_select? Why this does notwork?Do you know how other queue related fields work?:) Yes. If you notice queue_reset related critical spec bug fix was done when itwas introduced so that live migration can _actually_ work.When queue_select is done for 128 queues serially, it take a lot of time toread those slow register interface for this + inflight descriptors +more.interesting, virtio work in this pattern for many years, right?All these years 400Gbps and 800Gbps virtio was not present, number ofqueues were not in hw. The registers are control path in config space, how 400G or 800Gaffect??Because those are the one in practice requires large number of VQs. You are asking per VQ register commands to modify things dynamically viathis one vq at a time, serializing all the operations.It does not scale well with high q count.This is not dynamically, it only happens when SUSPEND and RESUME. This is the same mechanism how virtio initialize a virtqueue, working for many years.No. when virtio driver initializes it for the first time, there is no active trafficthat gets lost.This is because the interface is not yet up and not part of the networkyet.The resume must be fast enough, because the remote node is sendingpackets.Hence it is different from driver init time queue enable.I am not sure any packets arrive before a link announce at the destinationside.I think it can. Because there is no notification of member device link down intimation toremote side.The L4 and L5 protocols have no knowledge that node which they areinteracting is behind some layers of switches.So keeping this time low is desired.The NIC should broad cast itself first, so that other peers in the network know(for example its mac to route it) how to send a message toit.This is necessary, for example VIRTIO_NET_F_GUEST_ANNOUNCE, similar mechanism work for in-marketing productions for years. This is out of the topic anyway.See the virtio common cfg, you will find the max number of vqs is there, num_queues.:) Sure. those values at high q count affects.the driver need to initialize them anyway.That is before the traffic starts from remote end.see above, that needs a link announce and this is after re-initializationDevice didnât support LM. Many limitations existed all these years and TC is improving and expandingthem.So all these years do not matter.Not sure what are you talking about, haven't we initialize the device and vqs in config space for years?????? What's wrong with thismechanism?Are you questioning virito-pci fundamentals???Donât point to in-efficient past to establish similar in-efficient future.interesting, you know this is a one-time thing, right? and you are aware of this has been there for years.Like how to set a queue size and enable it?Those are meant to be used before DRIVER_OK stage as they are init timeregisters.Not to keep abusing them..don't you need to set queue_size at the destination side?No. But the src/dst does not matter. Queue_size to be set before DRIVER_OK like rest of the registers, as allqueues must be created before the driver_ok phase.Queue_reset was last moment exception.create a queue? Nvidia specific?Huh. No. Do git log and realize what happened with queue_reset.You didn't answer the question, does the spec even has defined "create avq"?Enabled/created = tomato/tomato when discussing the spec in non-normativeemail conversation.It's irrelevant.Then lets not debate on this enable a vq or create a vq anymoreAll I am saying is, when we know the limitations of the transport and when industry is forwarding to not introduced more and more on-die registerfor once in lifetime work of device migration, we just use the optimal command and queue interface that is native to virtio. PCI config space has its own limitations, and admin vq has its advantages, but that does not apply to all use cases.There was a recent work done emulating the SR-IOV cap and allowing VM toenable SR-IOV in [1].This is the option I mentioned few weeks ago. So with admin commands and admin virtqueues, even nested model will workusing [1].[1] https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov-offlo ad -o n-virtual-machines.htmlWe should take this into consideration once it is standardized in the spec, maybe not now, there can always be many workarounds to solve oneproblem.Sure, until that point the admin commands are able to suffice the needwell.And when the spec changes in transport occurs (if needed), current admincommand and admin vq also fits very well that will follow above [1]. we have pointed lots of problems for admin vq based live migration proposal, I won't repeat them hereI donât see any. Nested is already solved using above.I don't see how, do you mind to work out the patches?Once the base series is completed, nested cases can be addressed. I wont be able to work on the patches for it until we finish for the first level virtualization.
Long time ago, you mentioned some QoS issue, which anyway exists in thedevice register method too.Can you please list them if anything other than QoS and nest?
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]