[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
> From: Jason Wang <jasowang@redhat.com> > Sent: Monday, November 13, 2023 9:17 AM > > On Fri, Nov 10, 2023 at 8:31âPM Parav Pandit <parav@nvidia.com> wrote: > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > Sent: Friday, November 10, 2023 12:23 PM > > > > > > > > > On 11/9/2023 6:02 PM, Michael S. Tsirkin wrote: > > > > On Thu, Nov 09, 2023 at 06:00:27PM +0800, Zhu, Lingshan wrote: > > > >> > > > >> On 11/9/2023 1:44 AM, Michael S. Tsirkin wrote: > > > >>> On Tue, Nov 07, 2023 at 05:31:38PM +0800, Zhu, Lingshan wrote: > > > >>>> On 11/6/2023 6:52 PM, Parav Pandit wrote: > > > >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >>>>>> Sent: Monday, November 6, 2023 2:57 PM > > > >>>>>> > > > >>>>>> On 11/6/2023 12:12 PM, Parav Pandit wrote: > > > >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > > >>>>>>>> Sent: Monday, November 6, 2023 9:01 AM > > > >>>>>>>> > > > >>>>>>>> On 11/3/2023 11:50 PM, Parav Pandit wrote: > > > >>>>>>>>>> From: virtio-comment@lists.oasis-open.org > > > >>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, > > > >>>>>>>>>> Lingshan > > > >>>>>>>>>> Sent: Friday, November 3, 2023 8:27 PM > > > >>>>>>>>>> > > > >>>>>>>>>> On 11/3/2023 7:35 PM, Parav Pandit wrote: > > > >>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@intel.com> > > > >>>>>>>>>>>> Sent: Friday, November 3, 2023 4:05 PM > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> This patch adds two new le16 fields to common > > > >>>>>>>>>>>> configuration structure to support VIRTIO_F_QUEUE_STATE > > > >>>>>>>>>>>> in PCI transport > > > layer. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> > > > >>>>>>>>>>>> --- > > > >>>>>>>>>>>> transport-pci.tex | 18 ++++++++++++++++++ > > > >>>>>>>>>>>> 1 file changed, 18 insertions(+) > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> diff --git a/transport-pci.tex b/transport-pci.tex > > > >>>>>>>>>>>> index > > > >>>>>>>>>>>> a5c6719..3161519 100644 > > > >>>>>>>>>>>> --- a/transport-pci.tex > > > >>>>>>>>>>>> +++ b/transport-pci.tex > > > >>>>>>>>>>>> @@ -325,6 +325,10 @@ \subsubsection{Common > > > >>>>>>>>>>>> configuration > > > >>>>>>>> structure > > > >>>>>>>>>>>> layout}\label{sec:Virtio Transport > > > >>>>>>>>>>>> /* About the administration virtqueue. */ > > > >>>>>>>>>>>> le16 admin_queue_index; /* read-only for driver > > > */ > > > >>>>>>>>>>>> le16 admin_queue_num; /* read-only for driver > > > */ > > > >>>>>>>>>>>> + > > > >>>>>>>>>>>> + /* Virtqueue state */ > > > >>>>>>>>>>>> + le16 queue_avail_state; /* read-write */ > > > >>>>>>>>>>>> + le16 queue_used_state; /* read-write */ > > > >>>>>>>>>>> This tiny interface for 128 virtio net queues through > > > >>>>>>>>>>> register read writes, does > > > >>>>>>>>>> not work effectively. > > > >>>>>>>>>>> There are inflight out of order descriptors for block also. > > > >>>>>>>>>>> Hence toy registers like this do not work. > > > >>>>>>>>>> Do you know there is a queue_select? Why this does not work? > > > >>>>>>>>>> Do you know how other queue related fields work? > > > >>>>>>>>> :) > > > >>>>>>>>> Yes. If you notice queue_reset related critical spec bug > > > >>>>>>>>> fix was done when it > > > >>>>>>>> was introduced so that live migration can _actually_ work. > > > >>>>>>>>> When queue_select is done for 128 queues serially, it take > > > >>>>>>>>> a lot of time to > > > >>>>>>>> read those slow register interface for this + inflight > > > >>>>>>>> descriptors + > > > more. > > > >>>>>>>> interesting, virtio work in this pattern for many years, right? > > > >>>>>>> All these years 400Gbps and 800Gbps virtio was not present, > > > >>>>>>> number of > > > >>>>>> queues were not in hw. > > > >>>>>> The registers are control path in config space, how 400G or > > > >>>>>> 800G > > > affect?? > > > >>>>> Because those are the one in practice requires large number of VQs. > > > >>>>> > > > >>>>> You are asking per VQ register commands to modify things > > > >>>>> dynamically > > > via this one vq at a time, serializing all the operations. > > > >>>>> It does not scale well with high q count. > > > >>>> This is not dynamically, it only happens when SUSPEND and RESUME. > > > >>>> This is the same mechanism how virtio initialize a virtqueue, > > > >>>> working for many years. > > > >>> I wish we just had a transport vq already. That's the way to > > > >>> solve this not fighting individual bits. > > > >> Yeah, I agree, transport is a queued task(sent out V4 months > > > >> ago...), one by one... hard and tough work... > > > > Frankly I think that should take precedence, then Parav will not > > > > get annoyed each time add a couple of registers. > > > I agree, things can happen and we are already here.. > > Unfortunately transport vq is of not much help for below fundamental > reasons. > > > > 1.1 as it involves many VMEXITS of accessing runtime config spaced on slow > registers. > > 1.2 or alternatively hypervisor end up polling may thousands of registers > wasting cpu resources. > > 2. It does not help of future CC use case where hypervisor must not be > > involved in dynamic config 3. Complex device like vnet has already > > stopped using every growing config space and using cvq, large part of work in > 1.3 has shown that already 4. PFs also cannot infinitely grow registers, they also > need less on-die registers. > > You miss the point here. Nothing makes transport vq different from what you > proposed here (especially the device context part). > It does. Device context on read only reports delta. For transport via MMIO or VQ, reads/write what is asked. The second functional difference is: Transport VQ when present on the member device itself, it further makes the communication secure from the hypervisor. 3rd is: it eliminates above 1 to 4 isssues. > > > > And who knows the backward compatible SIOV devices may offer same bar as > VFs. > > Another self-contradictory, isn't it? You claim the bar is not scalable, but you still > want to offer a bar for SIOV? > > > > > virto spec has already outlined this efficient concept in the spec and TC > members are already following it. > > Another shifting concept. Admin commands/virtqueues makes sense doesn't > mean you can simply layer everything on top. > Not at all. Using CVQ for non init work is existing concept since 2014. No need to prove again. > What's more, Spec can't be 100% correct, that's why there are fixes or even > revert. In this aspect it is correct and all the recent work is rightly following it too. > > > > > SIOV for non-backward compatible mode, anyway, need new interface and vq > is inherently already there which fulfilling the needs. > > What interface did you mean here? How much does it differ from the transport > virtqueue? No point for me to deviate the discussion here. It is meaningless to invent SIOV when we are still making VF migration to work. What I know is, non-backward compat mode does not need giant PCI registers that we have today. > > Again, if you keep raising unrelated topics like CC or TDISP, the discussion won't > converge. And you are self-contradicting that you still haven't explained why > your proposal can work in those cases. > The fundamental point are: 1. it does not build on a volatile concept of involving hypervisor as listed in TDISP for driver to member device communication. 2. it provides a unified interface for PF, VF, SIOV. > Thanks > > > > > > > > > > > > >>>>>> See the virtio common cfg, you will find the max number of > > > >>>>>> vqs is there, num_queues. > > > >>>>> :) > > > >>>>> Sure. those values at high q count affects. > > > >>>> the driver need to initialize them anyway. > > > >>>>>>> Device didnât support LM. > > > >>>>>>> Many limitations existed all these years and TC is improving > > > >>>>>>> and expanding > > > >>>>>> them. > > > >>>>>>> So all these years do not matter. > > > >>>>>> Not sure what are you talking about, haven't we initialize > > > >>>>>> the device and vqs in config space for years?????? What's > > > >>>>>> wrong with this > > > mechanism? > > > >>>>>> Are you questioning virito-pci fundamentals??? > > > >>>>> Donât point to in-efficient past to establish similar in-efficient future. > > > >>>> interesting, you know this is a one-time thing, right? > > > >>>> and you are aware of this has been there for years. > > > >>>>>>>>>> Like how to set a queue size and enable it? > > > >>>>>>>>> Those are meant to be used before DRIVER_OK stage as they > > > >>>>>>>>> are init time > > > >>>>>>>> registers. > > > >>>>>>>>> Not to keep abusing them.. > > > >>>>>>>> don't you need to set queue_size at the destination side? > > > >>>>>>> No. > > > >>>>>>> But the src/dst does not matter. > > > >>>>>>> Queue_size to be set before DRIVER_OK like rest of the > > > >>>>>>> registers, as all > > > >>>>>> queues must be created before the driver_ok phase. > > > >>>>>>> Queue_reset was last moment exception. > > > >>>>>> create a queue? Nvidia specific? > > > >>>>>> > > > >>>>> Huh. No. > > > >>>>> Do git log and realize what happened with queue_reset. > > > >>>> You didn't answer the question, does the spec even has defined > > > >>>> "create a vq"? > > > >>>>>> For standard virtio, you need to read the number of enabled > > > >>>>>> vqs at the source side, then enable them at the dst, so > > > >>>>>> queue_size matters, > > > not to create. > > > >>>>> All that happens in the pre-copy phase. > > > >>>> Yes and how your answer related to this discussion? > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]