[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
> From: Jason Wang <jasowang@redhat.com> > Sent: Tuesday, November 21, 2023 10:01 AM > > On Fri, Nov 17, 2023 at 6:06âPM Parav Pandit <parav@nvidia.com> wrote: > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > Sent: Friday, November 17, 2023 3:32 PM > > > To: Parav Pandit <parav@nvidia.com>; jasowang@redhat.com; > > > mst@redhat.com; eperezma@redhat.com; cohuck@redhat.com; > > > stefanha@redhat.com > > > Cc: virtio-comment@lists.oasis-open.org > > > Subject: Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: > > > implement VIRTIO_F_QUEUE_STATE > > > > > > > > > > > > On 11/16/2023 6:21 PM, Parav Pandit wrote: > > > >> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >> Sent: Thursday, November 16, 2023 3:45 PM > > > >> > > > >> On 11/16/2023 1:35 AM, Parav Pandit wrote: > > > >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >>>> Sent: Monday, November 13, 2023 2:56 PM > > > >>>> > > > >>>> > > > >>>> > > > >>>> On 11/10/2023 8:31 PM, Parav Pandit wrote: > > > >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >>>>>> Sent: Friday, November 10, 2023 1:22 PM > > > >>>>>> > > > >>>>>> > > > >>>>>> On 11/9/2023 6:25 PM, Parav Pandit wrote: > > > >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >>>>>>>> Sent: Thursday, November 9, 2023 3:39 PM > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On 11/9/2023 2:28 PM, Parav Pandit wrote: > > > >>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >>>>>>>>>> Sent: Tuesday, November 7, 2023 3:02 PM > > > >>>>>>>>>> > > > >>>>>>>>>> On 11/6/2023 6:52 PM, Parav Pandit wrote: > > > >>>>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >>>>>>>>>>>> Sent: Monday, November 6, 2023 2:57 PM > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On 11/6/2023 12:12 PM, Parav Pandit wrote: > > > >>>>>>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >>>>>>>>>>>>>> Sent: Monday, November 6, 2023 9:01 AM > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On 11/3/2023 11:50 PM, Parav Pandit wrote: > > > >>>>>>>>>>>>>>>> From: virtio-comment@lists.oasis-open.org > > > >>>>>>>>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of > > > >>>>>>>>>>>>>>>> Zhu, Lingshan > > > >>>>>>>>>>>>>>>> Sent: Friday, November 3, 2023 8:27 PM > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On 11/3/2023 7:35 PM, Parav Pandit wrote: > > > >>>>>>>>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@intel.com> > > > >>>>>>>>>>>>>>>>>> Sent: Friday, November 3, 2023 4:05 PM > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> This patch adds two new le16 fields to common > > > >>>>>>>>>>>>>>>>>> configuration structure to support > > > >>>>>>>>>>>>>>>>>> VIRTIO_F_QUEUE_STATE in PCI transport > > > >>>>>> layer. > > > >>>>>>>>>>>>>>>>>> Signed-off-by: Zhu Lingshan > > > >>>>>>>>>>>>>>>>>> <lingshan.zhu@intel.com> > > > >>>>>>>>>>>>>>>>>> --- > > > >>>>>>>>>>>>>>>>>> transport-pci.tex | 18 ++++++++++++++++++ > > > >>>>>>>>>>>>>>>>>> 1 file changed, 18 insertions(+) > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> diff --git a/transport-pci.tex > > > >>>>>>>>>>>>>>>>>> b/transport-pci.tex index > > > >>>>>>>>>>>>>>>>>> a5c6719..3161519 100644 > > > >>>>>>>>>>>>>>>>>> --- a/transport-pci.tex > > > >>>>>>>>>>>>>>>>>> +++ b/transport-pci.tex > > > >>>>>>>>>>>>>>>>>> @@ -325,6 +325,10 @@ \subsubsection{Common > > > >>>> configuration > > > >>>>>>>>>>>>>> structure > > > >>>>>>>>>>>>>>>>>> layout}\label{sec:Virtio Transport > > > >>>>>>>>>>>>>>>>>> /* About the administration virtqueue. */ > > > >>>>>>>>>>>>>>>>>> le16 admin_queue_index; /* read-only > for > > > >> driver > > > >>>>>> */ > > > >>>>>>>>>>>>>>>>>> le16 admin_queue_num; /* read-only > for > > > >> driver > > > >>>>>> */ > > > >>>>>>>>>>>>>>>>>> + > > > >>>>>>>>>>>>>>>>>> + /* Virtqueue state */ > > > >>>>>>>>>>>>>>>>>> + le16 queue_avail_state; /* read-write */ > > > >>>>>>>>>>>>>>>>>> + le16 queue_used_state; /* read-write */ > > > >>>>>>>>>>>>>>>>> This tiny interface for 128 virtio net queues > > > >>>>>>>>>>>>>>>>> through register read writes, does > > > >>>>>>>>>>>>>>>> not work effectively. > > > >>>>>>>>>>>>>>>>> There are inflight out of order descriptors for block > also. > > > >>>>>>>>>>>>>>>>> Hence toy registers like this do not work. > > > >>>>>>>>>>>>>>>> Do you know there is a queue_select? Why this does > > > >>>>>>>>>>>>>>>> not > > > >> work? > > > >>>>>>>>>>>>>>>> Do you know how other queue related fields work? > > > >>>>>>>>>>>>>>> :) > > > >>>>>>>>>>>>>>> Yes. If you notice queue_reset related critical spec > > > >>>>>>>>>>>>>>> bug fix was done when it > > > >>>>>>>>>>>>>> was introduced so that live migration can _actually_ work. > > > >>>>>>>>>>>>>>> When queue_select is done for 128 queues serially, > > > >>>>>>>>>>>>>>> it take a lot of time to > > > >>>>>>>>>>>>>> read those slow register interface for this + > > > >>>>>>>>>>>>>> inflight descriptors + > > > >>>>>> more. > > > >>>>>>>>>>>>>> interesting, virtio work in this pattern for many years, > right? > > > >>>>>>>>>>>>> All these years 400Gbps and 800Gbps virtio was not > > > >>>>>>>>>>>>> present, number of > > > >>>>>>>>>>>> queues were not in hw. > > > >>>>>>>>>>>> The registers are control path in config space, how > > > >>>>>>>>>>>> 400G or 800G > > > >>>>>> affect?? > > > >>>>>>>>>>> Because those are the one in practice requires large > > > >>>>>>>>>>> number of > > > VQs. > > > >>>>>>>>>>> > > > >>>>>>>>>>> You are asking per VQ register commands to modify things > > > >>>>>>>>>>> dynamically via > > > >>>>>>>>>> this one vq at a time, serializing all the operations. > > > >>>>>>>>>>> It does not scale well with high q count. > > > >>>>>>>>>> This is not dynamically, it only happens when SUSPEND and > > > RESUME. > > > >>>>>>>>>> This is the same mechanism how virtio initialize a > > > >>>>>>>>>> virtqueue, working for many years. > > > >>>>>>>>> No. when virtio driver initializes it for the first time, > > > >>>>>>>>> there is no active traffic > > > >>>>>>>> that gets lost. > > > >>>>>>>>> This is because the interface is not yet up and not part > > > >>>>>>>>> of the network > > > >>>> yet. > > > >>>>>>>>> The resume must be fast enough, because the remote node is > > > >>>>>>>>> sending > > > >>>>>>>> packets. > > > >>>>>>>>> Hence it is different from driver init time queue enable. > > > >>>>>>>> I am not sure any packets arrive before a link announce at > > > >>>>>>>> the destination > > > >>>>>> side. > > > >>>>>>> I think it can. > > > >>>>>>> Because there is no notification of member device link down > > > >>>>>>> intimation to > > > >>>>>> remote side. > > > >>>>>>> The L4 and L5 protocols have no knowledge that node which > > > >>>>>>> they are > > > >>>>>> interacting is behind some layers of switches. > > > >>>>>>> So keeping this time low is desired. > > > >>>>>> The NIC should broad cast itself first, so that other peers > > > >>>>>> in the network know(for example its mac to route it) how to > > > >>>>>> send a message to > > > >> it. > > > >>>>>> This is necessary, for example VIRTIO_NET_F_GUEST_ANNOUNCE, > > > >>>>>> similar mechanism work for in-marketing productions for years. > > > >>>>>> > > > >>>>>> This is out of the topic anyway. > > > >>>>>>>>>>>> See the virtio common cfg, you will find the max number > > > >>>>>>>>>>>> of vqs is there, num_queues. > > > >>>>>>>>>>> :) > > > >>>>>>>>>>> Sure. those values at high q count affects. > > > >>>>>>>>>> the driver need to initialize them anyway. > > > >>>>>>>>> That is before the traffic starts from remote end. > > > >>>>>>>> see above, that needs a link announce and this is after > > > >>>>>>>> re-initialization > > > >>>>>>>>>>>>> Device didnât support LM. > > > >>>>>>>>>>>>> Many limitations existed all these years and TC is > > > >>>>>>>>>>>>> improving and expanding > > > >>>>>>>>>>>> them. > > > >>>>>>>>>>>>> So all these years do not matter. > > > >>>>>>>>>>>> Not sure what are you talking about, haven't we > > > >>>>>>>>>>>> initialize the device and vqs in config space for > > > >>>>>>>>>>>> years?????? What's wrong with this > > > >>>>>>>> mechanism? > > > >>>>>>>>>>>> Are you questioning virito-pci fundamentals??? > > > >>>>>>>>>>> Donât point to in-efficient past to establish similar > > > >>>>>>>>>>> in-efficient > > > future. > > > >>>>>>>>>> interesting, you know this is a one-time thing, right? > > > >>>>>>>>>> and you are aware of this has been there for years. > > > >>>>>>>>>>>>>>>> Like how to set a queue size and enable it? > > > >>>>>>>>>>>>>>> Those are meant to be used before DRIVER_OK stage as > > > >>>>>>>>>>>>>>> they are init time > > > >>>>>>>>>>>>>> registers. > > > >>>>>>>>>>>>>>> Not to keep abusing them.. > > > >>>>>>>>>>>>>> don't you need to set queue_size at the destination side? > > > >>>>>>>>>>>>> No. > > > >>>>>>>>>>>>> But the src/dst does not matter. > > > >>>>>>>>>>>>> Queue_size to be set before DRIVER_OK like rest of the > > > >>>>>>>>>>>>> registers, as all > > > >>>>>>>>>>>> queues must be created before the driver_ok phase. > > > >>>>>>>>>>>>> Queue_reset was last moment exception. > > > >>>>>>>>>>>> create a queue? Nvidia specific? > > > >>>>>>>>>>>> > > > >>>>>>>>>>> Huh. No. > > > >>>>>>>>>>> Do git log and realize what happened with queue_reset. > > > >>>>>>>>>> You didn't answer the question, does the spec even has > > > >>>>>>>>>> defined "create a > > > >>>>>>>> vq"? > > > >>>>>>>>> Enabled/created = tomato/tomato when discussing the spec > > > >>>>>>>>> in non-normative > > > >>>>>>>> email conversation. > > > >>>>>>>>> It's irrelevant. > > > >>>>>>>> Then lets not debate on this enable a vq or create a vq > > > >>>>>>>> anymore > > > >>>>>>>>> All I am saying is, when we know the limitations of the > > > >>>>>>>>> transport and when industry is forwarding to not > > > >>>>>>>>> introduced more and more on-die register > > > >>>>>>>> for once in lifetime work of device migration, we just use > > > >>>>>>>> the optimal command and queue interface that is native to virtio. > > > >>>>>>>> PCI config space has its own limitations, and admin vq has > > > >>>>>>>> its advantages, but that does not apply to all use cases. > > > >>>>>>>> > > > >>>>>>> There was a recent work done emulating the SR-IOV cap and > > > >>>>>>> allowing VM to > > > >>>>>> enable SR-IOV in [1]. > > > >>>>>>> This is the option I mentioned few weeks ago. > > > >>>>>>> > > > >>>>>>> So with admin commands and admin virtqueues, even nested > > > >>>>>>> model will work > > > >>>>>> using [1]. > > > >>>>>>> [1] > > > >>>>>>> https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov > > > >>>>>>> -off > > > >>>>>>> lo > > > >>>>>>> ad > > > >>>>>>> -o > > > >>>>>>> n-virtual-machines.html > > > >>>>>> We should take this into consideration once it is > > > >>>>>> standardized in the spec, maybe not now, there can always be > > > >>>>>> many workarounds to solve one > > > >>>> problem. > > > >>>>> Sure, until that point the admin commands are able to suffice > > > >>>>> the need > > > >> well. > > > >>>>> And when the spec changes in transport occurs (if needed), > > > >>>>> current admin > > > >>>> command and admin vq also fits very well that will follow above [1]. > > > >>>> we have pointed lots of problems for admin vq based live > > > >>>> migration proposal, I won't repeat them here > > > >>> I donât see any. > > > >>> Nested is already solved using above. > > > >> I don't see how, do you mind to work out the patches? > > > > Once the base series is completed, nested cases can be addressed. > > > > I wont be able to work on the patches for it until we finish for > > > > the first level > > > virtualization. > > > As you know, nested is supported well in current virtio, so please don't > break it. > > > > And same comment repeats. ð > > Expect same response... > > Sorry, no virtio specification does not support device migration today. > > Nothing is broken by adding new features. > > > > Above [1] has the right proposal that Jason's paper pointed out. Please use > it. > > I was involved in the design in [1]. And I don't see a connection to the > dicussion here > > 1) It is based on vDPA in L0 > 2) It doesn't address the nesting issue, it requires a proper design in the virtio > spec to support migration in the nesting layer. Nothing prevents [1] to be done without vdpa.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]