OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE


On Fri, Nov 10, 2023 at 8:31âPM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Friday, November 10, 2023 12:23 PM
> >
> >
> > On 11/9/2023 6:02 PM, Michael S. Tsirkin wrote:
> > > On Thu, Nov 09, 2023 at 06:00:27PM +0800, Zhu, Lingshan wrote:
> > >>
> > >> On 11/9/2023 1:44 AM, Michael S. Tsirkin wrote:
> > >>> On Tue, Nov 07, 2023 at 05:31:38PM +0800, Zhu, Lingshan wrote:
> > >>>> On 11/6/2023 6:52 PM, Parav Pandit wrote:
> > >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > >>>>>> Sent: Monday, November 6, 2023 2:57 PM
> > >>>>>>
> > >>>>>> On 11/6/2023 12:12 PM, Parav Pandit wrote:
> > >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > >>>>>>>> Sent: Monday, November 6, 2023 9:01 AM
> > >>>>>>>>
> > >>>>>>>> On 11/3/2023 11:50 PM, Parav Pandit wrote:
> > >>>>>>>>>> From: virtio-comment@lists.oasis-open.org
> > >>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
> > >>>>>>>>>> Lingshan
> > >>>>>>>>>> Sent: Friday, November 3, 2023 8:27 PM
> > >>>>>>>>>>
> > >>>>>>>>>> On 11/3/2023 7:35 PM, Parav Pandit wrote:
> > >>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@intel.com>
> > >>>>>>>>>>>> Sent: Friday, November 3, 2023 4:05 PM
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> This patch adds two new le16 fields to common configuration
> > >>>>>>>>>>>> structure to support VIRTIO_F_QUEUE_STATE in PCI transport
> > layer.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > >>>>>>>>>>>> ---
> > >>>>>>>>>>>>        transport-pci.tex | 18 ++++++++++++++++++
> > >>>>>>>>>>>>        1 file changed, 18 insertions(+)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> diff --git a/transport-pci.tex b/transport-pci.tex index
> > >>>>>>>>>>>> a5c6719..3161519 100644
> > >>>>>>>>>>>> --- a/transport-pci.tex
> > >>>>>>>>>>>> +++ b/transport-pci.tex
> > >>>>>>>>>>>> @@ -325,6 +325,10 @@ \subsubsection{Common configuration
> > >>>>>>>> structure
> > >>>>>>>>>>>> layout}\label{sec:Virtio Transport
> > >>>>>>>>>>>>                /* About the administration virtqueue. */
> > >>>>>>>>>>>>                le16 admin_queue_index;         /* read-only for driver
> > */
> > >>>>>>>>>>>>                le16 admin_queue_num;         /* read-only for driver
> > */
> > >>>>>>>>>>>> +
> > >>>>>>>>>>>> +        /* Virtqueue state */
> > >>>>>>>>>>>> +        le16 queue_avail_state;         /* read-write */
> > >>>>>>>>>>>> +        le16 queue_used_state;          /* read-write */
> > >>>>>>>>>>> This tiny interface for 128 virtio net queues through
> > >>>>>>>>>>> register read writes, does
> > >>>>>>>>>> not work effectively.
> > >>>>>>>>>>> There are inflight out of order descriptors for block also.
> > >>>>>>>>>>> Hence toy registers like this do not work.
> > >>>>>>>>>> Do you know there is a queue_select? Why this does not work?
> > >>>>>>>>>> Do you know how other queue related fields work?
> > >>>>>>>>> :)
> > >>>>>>>>> Yes. If you notice queue_reset related critical spec bug fix
> > >>>>>>>>> was done when it
> > >>>>>>>> was introduced so that live migration can _actually_ work.
> > >>>>>>>>> When queue_select is done for 128 queues serially, it take a
> > >>>>>>>>> lot of time to
> > >>>>>>>> read those slow register interface for this + inflight descriptors +
> > more.
> > >>>>>>>> interesting, virtio work in this pattern for many years, right?
> > >>>>>>> All these years 400Gbps and 800Gbps virtio was not present,
> > >>>>>>> number of
> > >>>>>> queues were not in hw.
> > >>>>>> The registers are control path in config space, how 400G or 800G
> > affect??
> > >>>>> Because those are the one in practice requires large number of VQs.
> > >>>>>
> > >>>>> You are asking per VQ register commands to modify things dynamically
> > via this one vq at a time, serializing all the operations.
> > >>>>> It does not scale well with high q count.
> > >>>> This is not dynamically, it only happens when SUSPEND and RESUME.
> > >>>> This is the same mechanism how virtio initialize a virtqueue,
> > >>>> working for many years.
> > >>> I wish we just had a transport vq already. That's the way to solve
> > >>> this not fighting individual bits.
> > >> Yeah, I agree, transport is a queued task(sent out V4 months ago...),
> > >> one by one... hard and tough work...
> > > Frankly I think that should take precedence, then Parav will not get
> > > annoyed each time add a couple of registers.
> > I agree, things can happen and we are already here..
> Unfortunately transport vq is of not much help for below fundamental reasons.
>
> 1.1 as it involves many VMEXITS of accessing runtime config spaced on slow registers.
> 1.2 or alternatively hypervisor end up polling may thousands of registers wasting cpu resources.
> 2. It does not help of future CC use case where hypervisor must not be involved in dynamic config
> 3. Complex device like vnet has already stopped using every growing config space and using cvq, large part of work in 1.3 has shown that already
> 4. PFs also cannot infinitely grow registers, they also need less on-die registers.

You miss the point here. Nothing makes transport vq different from
what you proposed here (especially the device context part).

>
> And who knows the backward compatible SIOV devices may offer same bar as VFs.

Another self-contradictory, isn't it? You claim the bar is not
scalable, but you still want to offer a bar for SIOV?

>
> virto spec has already outlined this efficient concept in the spec and TC members are already following it.

Another shifting concept. Admin commands/virtqueues makes sense
doesn't mean you can simply layer everything on top.

What's more, Spec can't be 100% correct, that's why there are fixes or
even revert.

>
> SIOV for non-backward compatible mode, anyway, need new interface and vq is inherently already there which fulfilling the needs.

What interface did you mean here? How much does it differ from the
transport virtqueue?

Again, if you keep raising unrelated topics like CC or TDISP, the
discussion won't converge. And you are self-contradicting that you
still haven't explained why your proposal can work in those cases.

Thanks



>
> > >
> > >>>>>> See the virtio common cfg, you will find the max number of vqs is
> > >>>>>> there, num_queues.
> > >>>>> :)
> > >>>>> Sure. those values at high q count affects.
> > >>>> the driver need to initialize them anyway.
> > >>>>>>> Device didnât support LM.
> > >>>>>>> Many limitations existed all these years and TC is improving and
> > >>>>>>> expanding
> > >>>>>> them.
> > >>>>>>> So all these years do not matter.
> > >>>>>> Not sure what are you talking about, haven't we initialize the
> > >>>>>> device and vqs in config space for years?????? What's wrong with this
> > mechanism?
> > >>>>>> Are you questioning virito-pci fundamentals???
> > >>>>> Donât point to in-efficient past to establish similar in-efficient future.
> > >>>> interesting, you know this is a one-time thing, right?
> > >>>> and you are aware of this has been there for years.
> > >>>>>>>>>> Like how to set a queue size and enable it?
> > >>>>>>>>> Those are meant to be used before DRIVER_OK stage as they are
> > >>>>>>>>> init time
> > >>>>>>>> registers.
> > >>>>>>>>> Not to keep abusing them..
> > >>>>>>>> don't you need to set queue_size at the destination side?
> > >>>>>>> No.
> > >>>>>>> But the src/dst does not matter.
> > >>>>>>> Queue_size to be set before DRIVER_OK like rest of the
> > >>>>>>> registers, as all
> > >>>>>> queues must be created before the driver_ok phase.
> > >>>>>>> Queue_reset was last moment exception.
> > >>>>>> create a queue? Nvidia specific?
> > >>>>>>
> > >>>>> Huh. No.
> > >>>>> Do git log and realize what happened with queue_reset.
> > >>>> You didn't answer the question, does the spec even has defined
> > >>>> "create a vq"?
> > >>>>>> For standard virtio, you need to read the number of enabled vqs
> > >>>>>> at the source side, then enable them at the dst, so queue_size matters,
> > not to create.
> > >>>>> All that happens in the pre-copy phase.
> > >>>> Yes and how your answer related to this discussion?
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]