virtio-comment message

Subject: RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE

From: Parav Pandit <parav@nvidia.com>
To: Jason Wang <jasowang@redhat.com>
Date: Wed, 15 Nov 2023 17:36:22 +0000


> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, November 13, 2023 9:17 AM
> 
> On Fri, Nov 10, 2023 at 8:31âPM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Friday, November 10, 2023 12:23 PM
> > >
> > >
> > > On 11/9/2023 6:02 PM, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 09, 2023 at 06:00:27PM +0800, Zhu, Lingshan wrote:
> > > >>
> > > >> On 11/9/2023 1:44 AM, Michael S. Tsirkin wrote:
> > > >>> On Tue, Nov 07, 2023 at 05:31:38PM +0800, Zhu, Lingshan wrote:
> > > >>>> On 11/6/2023 6:52 PM, Parav Pandit wrote:
> > > >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >>>>>> Sent: Monday, November 6, 2023 2:57 PM
> > > >>>>>>
> > > >>>>>> On 11/6/2023 12:12 PM, Parav Pandit wrote:
> > > >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >>>>>>>> Sent: Monday, November 6, 2023 9:01 AM
> > > >>>>>>>>
> > > >>>>>>>> On 11/3/2023 11:50 PM, Parav Pandit wrote:
> > > >>>>>>>>>> From: virtio-comment@lists.oasis-open.org
> > > >>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
> > > >>>>>>>>>> Lingshan
> > > >>>>>>>>>> Sent: Friday, November 3, 2023 8:27 PM
> > > >>>>>>>>>>
> > > >>>>>>>>>> On 11/3/2023 7:35 PM, Parav Pandit wrote:
> > > >>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@intel.com>
> > > >>>>>>>>>>>> Sent: Friday, November 3, 2023 4:05 PM
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> This patch adds two new le16 fields to common
> > > >>>>>>>>>>>> configuration structure to support VIRTIO_F_QUEUE_STATE
> > > >>>>>>>>>>>> in PCI transport
> > > layer.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > >>>>>>>>>>>> ---
> > > >>>>>>>>>>>>        transport-pci.tex | 18 ++++++++++++++++++
> > > >>>>>>>>>>>>        1 file changed, 18 insertions(+)
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> diff --git a/transport-pci.tex b/transport-pci.tex
> > > >>>>>>>>>>>> index
> > > >>>>>>>>>>>> a5c6719..3161519 100644
> > > >>>>>>>>>>>> --- a/transport-pci.tex
> > > >>>>>>>>>>>> +++ b/transport-pci.tex
> > > >>>>>>>>>>>> @@ -325,6 +325,10 @@ \subsubsection{Common
> > > >>>>>>>>>>>> configuration
> > > >>>>>>>> structure
> > > >>>>>>>>>>>> layout}\label{sec:Virtio Transport
> > > >>>>>>>>>>>>                /* About the administration virtqueue. */
> > > >>>>>>>>>>>>                le16 admin_queue_index;         /* read-only for driver
> > > */
> > > >>>>>>>>>>>>                le16 admin_queue_num;         /* read-only for driver
> > > */
> > > >>>>>>>>>>>> +
> > > >>>>>>>>>>>> +        /* Virtqueue state */
> > > >>>>>>>>>>>> +        le16 queue_avail_state;         /* read-write */
> > > >>>>>>>>>>>> +        le16 queue_used_state;          /* read-write */
> > > >>>>>>>>>>> This tiny interface for 128 virtio net queues through
> > > >>>>>>>>>>> register read writes, does
> > > >>>>>>>>>> not work effectively.
> > > >>>>>>>>>>> There are inflight out of order descriptors for block also.
> > > >>>>>>>>>>> Hence toy registers like this do not work.
> > > >>>>>>>>>> Do you know there is a queue_select? Why this does not work?
> > > >>>>>>>>>> Do you know how other queue related fields work?
> > > >>>>>>>>> :)
> > > >>>>>>>>> Yes. If you notice queue_reset related critical spec bug
> > > >>>>>>>>> fix was done when it
> > > >>>>>>>> was introduced so that live migration can _actually_ work.
> > > >>>>>>>>> When queue_select is done for 128 queues serially, it take
> > > >>>>>>>>> a lot of time to
> > > >>>>>>>> read those slow register interface for this + inflight
> > > >>>>>>>> descriptors +
> > > more.
> > > >>>>>>>> interesting, virtio work in this pattern for many years, right?
> > > >>>>>>> All these years 400Gbps and 800Gbps virtio was not present,
> > > >>>>>>> number of
> > > >>>>>> queues were not in hw.
> > > >>>>>> The registers are control path in config space, how 400G or
> > > >>>>>> 800G
> > > affect??
> > > >>>>> Because those are the one in practice requires large number of VQs.
> > > >>>>>
> > > >>>>> You are asking per VQ register commands to modify things
> > > >>>>> dynamically
> > > via this one vq at a time, serializing all the operations.
> > > >>>>> It does not scale well with high q count.
> > > >>>> This is not dynamically, it only happens when SUSPEND and RESUME.
> > > >>>> This is the same mechanism how virtio initialize a virtqueue,
> > > >>>> working for many years.
> > > >>> I wish we just had a transport vq already. That's the way to
> > > >>> solve this not fighting individual bits.
> > > >> Yeah, I agree, transport is a queued task(sent out V4 months
> > > >> ago...), one by one... hard and tough work...
> > > > Frankly I think that should take precedence, then Parav will not
> > > > get annoyed each time add a couple of registers.
> > > I agree, things can happen and we are already here..
> > Unfortunately transport vq is of not much help for below fundamental
> reasons.
> >
> > 1.1 as it involves many VMEXITS of accessing runtime config spaced on slow
> registers.
> > 1.2 or alternatively hypervisor end up polling may thousands of registers
> wasting cpu resources.
> > 2. It does not help of future CC use case where hypervisor must not be
> > involved in dynamic config 3. Complex device like vnet has already
> > stopped using every growing config space and using cvq, large part of work in
> 1.3 has shown that already 4. PFs also cannot infinitely grow registers, they also
> need less on-die registers.
> 
> You miss the point here. Nothing makes transport vq different from what you
> proposed here (especially the device context part).
> 
It does.
Device context on read only reports delta. For transport via MMIO or VQ, reads/write what is asked.

The second functional difference is:
Transport VQ when present on the member device itself, it further makes the communication secure from the hypervisor.
3rd is: it eliminates above 1 to 4 isssues.

> >
> > And who knows the backward compatible SIOV devices may offer same bar as
> VFs.
> 
> Another self-contradictory, isn't it? You claim the bar is not scalable, but you still
> want to offer a bar for SIOV?
> 
> >
> > virto spec has already outlined this efficient concept in the spec and TC
> members are already following it.
> 
> Another shifting concept. Admin commands/virtqueues makes sense doesn't
> mean you can simply layer everything on top.
>
Not at all. Using CVQ for non init work is existing concept since 2014.
No need to prove again.

> What's more, Spec can't be 100% correct, that's why there are fixes or even
> revert.
In this aspect it is correct and all the recent work is rightly following it too.

> 
> >
> > SIOV for non-backward compatible mode, anyway, need new interface and vq
> is inherently already there which fulfilling the needs.
> 
> What interface did you mean here? How much does it differ from the transport
> virtqueue?
No point for me to deviate the discussion here. It is meaningless to invent SIOV when we are still making VF migration to work.
What I know is, non-backward compat mode does not need giant PCI registers that we have today.

> 
> Again, if you keep raising unrelated topics like CC or TDISP, the discussion won't
> converge. And you are self-contradicting that you still haven't explained why
> your proposal can work in those cases.
>
The fundamental point are: 
1. it does not build on a volatile concept of involving hypervisor as listed in TDISP for driver to member device communication.
2. it provides a unified interface for PF, VF, SIOV.

 
> Thanks
> 
> 
> 
> >
> > > >
> > > >>>>>> See the virtio common cfg, you will find the max number of
> > > >>>>>> vqs is there, num_queues.
> > > >>>>> :)
> > > >>>>> Sure. those values at high q count affects.
> > > >>>> the driver need to initialize them anyway.
> > > >>>>>>> Device didnât support LM.
> > > >>>>>>> Many limitations existed all these years and TC is improving
> > > >>>>>>> and expanding
> > > >>>>>> them.
> > > >>>>>>> So all these years do not matter.
> > > >>>>>> Not sure what are you talking about, haven't we initialize
> > > >>>>>> the device and vqs in config space for years?????? What's
> > > >>>>>> wrong with this
> > > mechanism?
> > > >>>>>> Are you questioning virito-pci fundamentals???
> > > >>>>> Donât point to in-efficient past to establish similar in-efficient future.
> > > >>>> interesting, you know this is a one-time thing, right?
> > > >>>> and you are aware of this has been there for years.
> > > >>>>>>>>>> Like how to set a queue size and enable it?
> > > >>>>>>>>> Those are meant to be used before DRIVER_OK stage as they
> > > >>>>>>>>> are init time
> > > >>>>>>>> registers.
> > > >>>>>>>>> Not to keep abusing them..
> > > >>>>>>>> don't you need to set queue_size at the destination side?
> > > >>>>>>> No.
> > > >>>>>>> But the src/dst does not matter.
> > > >>>>>>> Queue_size to be set before DRIVER_OK like rest of the
> > > >>>>>>> registers, as all
> > > >>>>>> queues must be created before the driver_ok phase.
> > > >>>>>>> Queue_reset was last moment exception.
> > > >>>>>> create a queue? Nvidia specific?
> > > >>>>>>
> > > >>>>> Huh. No.
> > > >>>>> Do git log and realize what happened with queue_reset.
> > > >>>> You didn't answer the question, does the spec even has defined
> > > >>>> "create a vq"?
> > > >>>>>> For standard virtio, you need to read the number of enabled
> > > >>>>>> vqs at the source side, then enable them at the dst, so
> > > >>>>>> queue_size matters,
> > > not to create.
> > > >>>>> All that happens in the pre-copy phase.
> > > >>>> Yes and how your answer related to this discussion?
> >

References:
- RE: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Jason Wang <jasowang@redhat.com>