[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
> From: Zhu, Lingshan <lingshan.zhu@intel.com> > Sent: Thursday, November 16, 2023 3:45 PM > > On 11/16/2023 1:35 AM, Parav Pandit wrote: > > > >> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >> Sent: Monday, November 13, 2023 2:56 PM > >> > >> > >> > >> On 11/10/2023 8:31 PM, Parav Pandit wrote: > >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>> Sent: Friday, November 10, 2023 1:22 PM > >>>> > >>>> > >>>> On 11/9/2023 6:25 PM, Parav Pandit wrote: > >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>>>> Sent: Thursday, November 9, 2023 3:39 PM > >>>>>> > >>>>>> > >>>>>> On 11/9/2023 2:28 PM, Parav Pandit wrote: > >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>>>>>> Sent: Tuesday, November 7, 2023 3:02 PM > >>>>>>>> > >>>>>>>> On 11/6/2023 6:52 PM, Parav Pandit wrote: > >>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>>>>>>>> Sent: Monday, November 6, 2023 2:57 PM > >>>>>>>>>> > >>>>>>>>>> On 11/6/2023 12:12 PM, Parav Pandit wrote: > >>>>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>>>>>>>>>> Sent: Monday, November 6, 2023 9:01 AM > >>>>>>>>>>>> > >>>>>>>>>>>> On 11/3/2023 11:50 PM, Parav Pandit wrote: > >>>>>>>>>>>>>> From: virtio-comment@lists.oasis-open.org > >>>>>>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, > >>>>>>>>>>>>>> Lingshan > >>>>>>>>>>>>>> Sent: Friday, November 3, 2023 8:27 PM > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 11/3/2023 7:35 PM, Parav Pandit wrote: > >>>>>>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@intel.com> > >>>>>>>>>>>>>>>> Sent: Friday, November 3, 2023 4:05 PM > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> This patch adds two new le16 fields to common > >>>>>>>>>>>>>>>> configuration structure to support VIRTIO_F_QUEUE_STATE > >>>>>>>>>>>>>>>> in PCI transport > >>>> layer. > >>>>>>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > >>>>>>>>>>>>>>>> --- > >>>>>>>>>>>>>>>> transport-pci.tex | 18 ++++++++++++++++++ > >>>>>>>>>>>>>>>> 1 file changed, 18 insertions(+) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> diff --git a/transport-pci.tex b/transport-pci.tex > >>>>>>>>>>>>>>>> index > >>>>>>>>>>>>>>>> a5c6719..3161519 100644 > >>>>>>>>>>>>>>>> --- a/transport-pci.tex > >>>>>>>>>>>>>>>> +++ b/transport-pci.tex > >>>>>>>>>>>>>>>> @@ -325,6 +325,10 @@ \subsubsection{Common > >> configuration > >>>>>>>>>>>> structure > >>>>>>>>>>>>>>>> layout}\label{sec:Virtio Transport > >>>>>>>>>>>>>>>> /* About the administration virtqueue. */ > >>>>>>>>>>>>>>>> le16 admin_queue_index; /* read-only for > driver > >>>> */ > >>>>>>>>>>>>>>>> le16 admin_queue_num; /* read-only for > driver > >>>> */ > >>>>>>>>>>>>>>>> + > >>>>>>>>>>>>>>>> + /* Virtqueue state */ > >>>>>>>>>>>>>>>> + le16 queue_avail_state; /* read-write */ > >>>>>>>>>>>>>>>> + le16 queue_used_state; /* read-write */ > >>>>>>>>>>>>>>> This tiny interface for 128 virtio net queues through > >>>>>>>>>>>>>>> register read writes, does > >>>>>>>>>>>>>> not work effectively. > >>>>>>>>>>>>>>> There are inflight out of order descriptors for block also. > >>>>>>>>>>>>>>> Hence toy registers like this do not work. > >>>>>>>>>>>>>> Do you know there is a queue_select? Why this does not > work? > >>>>>>>>>>>>>> Do you know how other queue related fields work? > >>>>>>>>>>>>> :) > >>>>>>>>>>>>> Yes. If you notice queue_reset related critical spec bug > >>>>>>>>>>>>> fix was done when it > >>>>>>>>>>>> was introduced so that live migration can _actually_ work. > >>>>>>>>>>>>> When queue_select is done for 128 queues serially, it take > >>>>>>>>>>>>> a lot of time to > >>>>>>>>>>>> read those slow register interface for this + inflight > >>>>>>>>>>>> descriptors + > >>>> more. > >>>>>>>>>>>> interesting, virtio work in this pattern for many years, right? > >>>>>>>>>>> All these years 400Gbps and 800Gbps virtio was not present, > >>>>>>>>>>> number of > >>>>>>>>>> queues were not in hw. > >>>>>>>>>> The registers are control path in config space, how 400G or > >>>>>>>>>> 800G > >>>> affect?? > >>>>>>>>> Because those are the one in practice requires large number of VQs. > >>>>>>>>> > >>>>>>>>> You are asking per VQ register commands to modify things > >>>>>>>>> dynamically via > >>>>>>>> this one vq at a time, serializing all the operations. > >>>>>>>>> It does not scale well with high q count. > >>>>>>>> This is not dynamically, it only happens when SUSPEND and RESUME. > >>>>>>>> This is the same mechanism how virtio initialize a virtqueue, > >>>>>>>> working for many years. > >>>>>>> No. when virtio driver initializes it for the first time, there > >>>>>>> is no active traffic > >>>>>> that gets lost. > >>>>>>> This is because the interface is not yet up and not part of the > >>>>>>> network > >> yet. > >>>>>>> The resume must be fast enough, because the remote node is > >>>>>>> sending > >>>>>> packets. > >>>>>>> Hence it is different from driver init time queue enable. > >>>>>> I am not sure any packets arrive before a link announce at the > >>>>>> destination > >>>> side. > >>>>> I think it can. > >>>>> Because there is no notification of member device link down > >>>>> intimation to > >>>> remote side. > >>>>> The L4 and L5 protocols have no knowledge that node which they are > >>>> interacting is behind some layers of switches. > >>>>> So keeping this time low is desired. > >>>> The NIC should broad cast itself first, so that other peers in the > >>>> network know(for example its mac to route it) how to send a message to > it. > >>>> > >>>> This is necessary, for example VIRTIO_NET_F_GUEST_ANNOUNCE, similar > >>>> mechanism work for in-marketing productions for years. > >>>> > >>>> This is out of the topic anyway. > >>>>>>>>>> See the virtio common cfg, you will find the max number of > >>>>>>>>>> vqs is there, num_queues. > >>>>>>>>> :) > >>>>>>>>> Sure. those values at high q count affects. > >>>>>>>> the driver need to initialize them anyway. > >>>>>>> That is before the traffic starts from remote end. > >>>>>> see above, that needs a link announce and this is after > >>>>>> re-initialization > >>>>>>>>>>> Device didnât support LM. > >>>>>>>>>>> Many limitations existed all these years and TC is improving > >>>>>>>>>>> and expanding > >>>>>>>>>> them. > >>>>>>>>>>> So all these years do not matter. > >>>>>>>>>> Not sure what are you talking about, haven't we initialize > >>>>>>>>>> the device and vqs in config space for years?????? What's > >>>>>>>>>> wrong with this > >>>>>> mechanism? > >>>>>>>>>> Are you questioning virito-pci fundamentals??? > >>>>>>>>> Donât point to in-efficient past to establish similar in-efficient future. > >>>>>>>> interesting, you know this is a one-time thing, right? > >>>>>>>> and you are aware of this has been there for years. > >>>>>>>>>>>>>> Like how to set a queue size and enable it? > >>>>>>>>>>>>> Those are meant to be used before DRIVER_OK stage as they > >>>>>>>>>>>>> are init time > >>>>>>>>>>>> registers. > >>>>>>>>>>>>> Not to keep abusing them.. > >>>>>>>>>>>> don't you need to set queue_size at the destination side? > >>>>>>>>>>> No. > >>>>>>>>>>> But the src/dst does not matter. > >>>>>>>>>>> Queue_size to be set before DRIVER_OK like rest of the > >>>>>>>>>>> registers, as all > >>>>>>>>>> queues must be created before the driver_ok phase. > >>>>>>>>>>> Queue_reset was last moment exception. > >>>>>>>>>> create a queue? Nvidia specific? > >>>>>>>>>> > >>>>>>>>> Huh. No. > >>>>>>>>> Do git log and realize what happened with queue_reset. > >>>>>>>> You didn't answer the question, does the spec even has defined > >>>>>>>> "create a > >>>>>> vq"? > >>>>>>> Enabled/created = tomato/tomato when discussing the spec in > >>>>>>> non-normative > >>>>>> email conversation. > >>>>>>> It's irrelevant. > >>>>>> Then lets not debate on this enable a vq or create a vq anymore > >>>>>>> All I am saying is, when we know the limitations of the > >>>>>>> transport and when industry is forwarding to not introduced more > >>>>>>> and more on-die register > >>>>>> for once in lifetime work of device migration, we just use the > >>>>>> optimal command and queue interface that is native to virtio. > >>>>>> PCI config space has its own limitations, and admin vq has its > >>>>>> advantages, but that does not apply to all use cases. > >>>>>> > >>>>> There was a recent work done emulating the SR-IOV cap and allowing > >>>>> VM to > >>>> enable SR-IOV in [1]. > >>>>> This is the option I mentioned few weeks ago. > >>>>> > >>>>> So with admin commands and admin virtqueues, even nested model > >>>>> will work > >>>> using [1]. > >>>>> [1] > >>>>> https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov-offlo > >>>>> ad > >>>>> -o > >>>>> n-virtual-machines.html > >>>> We should take this into consideration once it is standardized in > >>>> the spec, maybe not now, there can always be many workarounds to > >>>> solve one > >> problem. > >>> Sure, until that point the admin commands are able to suffice the need > well. > >>> And when the spec changes in transport occurs (if needed), current > >>> admin > >> command and admin vq also fits very well that will follow above [1]. > >> we have pointed lots of problems for admin vq based live migration > >> proposal, I won't repeat them here > > I donât see any. > > Nested is already solved using above. > I don't see how, do you mind to work out the patches? Once the base series is completed, nested cases can be addressed. I wont be able to work on the patches for it until we finish for the first level virtualization. > > Long time ago, you mentioned some QoS issue, which anyway exists in the > device register method too. > > Can you please list them if anything other than QoS and nest?
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]