virtio-comment message

Subject: RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE

From: Parav Pandit <parav@nvidia.com>
To: "Zhu, Lingshan" <lingshan.zhu@intel.com>, "jasowang@redhat.com" <jasowang@redhat.com>, "mst@redhat.com" <mst@redhat.com>, "eperezma@redhat.com" <eperezma@redhat.com>, "cohuck@redhat.com" <cohuck@redhat.com>, "stefanha@redhat.com" <stefanha@redhat.com>
Date: Fri, 10 Nov 2023 12:31:28 +0000

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Friday, November 10, 2023 1:22 PM
> 
> 
> On 11/9/2023 6:25 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Thursday, November 9, 2023 3:39 PM
> >>
> >>
> >> On 11/9/2023 2:28 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, November 7, 2023 3:02 PM
> >>>>
> >>>> On 11/6/2023 6:52 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Monday, November 6, 2023 2:57 PM
> >>>>>>
> >>>>>> On 11/6/2023 12:12 PM, Parav Pandit wrote:
> >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>>>> Sent: Monday, November 6, 2023 9:01 AM
> >>>>>>>>
> >>>>>>>> On 11/3/2023 11:50 PM, Parav Pandit wrote:
> >>>>>>>>>> From: virtio-comment@lists.oasis-open.org
> >>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
> >>>>>>>>>> Lingshan
> >>>>>>>>>> Sent: Friday, November 3, 2023 8:27 PM
> >>>>>>>>>>
> >>>>>>>>>> On 11/3/2023 7:35 PM, Parav Pandit wrote:
> >>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@intel.com>
> >>>>>>>>>>>> Sent: Friday, November 3, 2023 4:05 PM
> >>>>>>>>>>>>
> >>>>>>>>>>>> This patch adds two new le16 fields to common configuration
> >>>>>>>>>>>> structure to support VIRTIO_F_QUEUE_STATE in PCI transport
> layer.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>        transport-pci.tex | 18 ++++++++++++++++++
> >>>>>>>>>>>>        1 file changed, 18 insertions(+)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/transport-pci.tex b/transport-pci.tex index
> >>>>>>>>>>>> a5c6719..3161519 100644
> >>>>>>>>>>>> --- a/transport-pci.tex
> >>>>>>>>>>>> +++ b/transport-pci.tex
> >>>>>>>>>>>> @@ -325,6 +325,10 @@ \subsubsection{Common configuration
> >>>>>>>> structure
> >>>>>>>>>>>> layout}\label{sec:Virtio Transport
> >>>>>>>>>>>>                /* About the administration virtqueue. */
> >>>>>>>>>>>>                le16 admin_queue_index;         /* read-only for driver
> */
> >>>>>>>>>>>>                le16 admin_queue_num;         /* read-only for driver
> */
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +	/* Virtqueue state */
> >>>>>>>>>>>> +        le16 queue_avail_state;         /* read-write */
> >>>>>>>>>>>> +        le16 queue_used_state;          /* read-write */
> >>>>>>>>>>> This tiny interface for 128 virtio net queues through
> >>>>>>>>>>> register read writes, does
> >>>>>>>>>> not work effectively.
> >>>>>>>>>>> There are inflight out of order descriptors for block also.
> >>>>>>>>>>> Hence toy registers like this do not work.
> >>>>>>>>>> Do you know there is a queue_select? Why this does not work?
> >>>>>>>>>> Do you know how other queue related fields work?
> >>>>>>>>> :)
> >>>>>>>>> Yes. If you notice queue_reset related critical spec bug fix
> >>>>>>>>> was done when it
> >>>>>>>> was introduced so that live migration can _actually_ work.
> >>>>>>>>> When queue_select is done for 128 queues serially, it take a
> >>>>>>>>> lot of time to
> >>>>>>>> read those slow register interface for this + inflight descriptors +
> more.
> >>>>>>>> interesting, virtio work in this pattern for many years, right?
> >>>>>>> All these years 400Gbps and 800Gbps virtio was not present,
> >>>>>>> number of
> >>>>>> queues were not in hw.
> >>>>>> The registers are control path in config space, how 400G or 800G
> affect??
> >>>>> Because those are the one in practice requires large number of VQs.
> >>>>>
> >>>>> You are asking per VQ register commands to modify things
> >>>>> dynamically via
> >>>> this one vq at a time, serializing all the operations.
> >>>>> It does not scale well with high q count.
> >>>> This is not dynamically, it only happens when SUSPEND and RESUME.
> >>>> This is the same mechanism how virtio initialize a virtqueue,
> >>>> working for many years.
> >>> No. when virtio driver initializes it for the first time, there is
> >>> no active traffic
> >> that gets lost.
> >>> This is because the interface is not yet up and not part of the network yet.
> >>>
> >>> The resume must be fast enough, because the remote node is sending
> >> packets.
> >>> Hence it is different from driver init time queue enable.
> >> I am not sure any packets arrive before a link announce at the destination
> side.
> > I think it can.
> > Because there is no notification of member device link down intimation to
> remote side.
> > The L4 and L5 protocols have no knowledge that node which they are
> interacting is behind some layers of switches.
> >
> > So keeping this time low is desired.
> The NIC should broad cast itself first, so that other peers in the network
> know(for example its mac to route it) how to send a message to it.
> 
> This is necessary, for example VIRTIO_NET_F_GUEST_ANNOUNCE, similar
> mechanism work for in-marketing productions for years.
> 
> This is out of the topic anyway.
> >
> >>>>>> See the virtio common cfg, you will find the max number of vqs is
> >>>>>> there, num_queues.
> >>>>> :)
> >>>>> Sure. those values at high q count affects.
> >>>> the driver need to initialize them anyway.
> >>> That is before the traffic starts from remote end.
> >> see above, that needs a link announce and this is after
> >> re-initialization
> >>>>>>> Device didnât support LM.
> >>>>>>> Many limitations existed all these years and TC is improving and
> >>>>>>> expanding
> >>>>>> them.
> >>>>>>> So all these years do not matter.
> >>>>>> Not sure what are you talking about, haven't we initialize the
> >>>>>> device and vqs in config space for years?????? What's wrong with
> >>>>>> this
> >> mechanism?
> >>>>>> Are you questioning virito-pci fundamentals???
> >>>>> Donât point to in-efficient past to establish similar in-efficient future.
> >>>> interesting, you know this is a one-time thing, right?
> >>>> and you are aware of this has been there for years.
> >>>>>>>>>> Like how to set a queue size and enable it?
> >>>>>>>>> Those are meant to be used before DRIVER_OK stage as they are
> >>>>>>>>> init time
> >>>>>>>> registers.
> >>>>>>>>> Not to keep abusing them..
> >>>>>>>> don't you need to set queue_size at the destination side?
> >>>>>>> No.
> >>>>>>> But the src/dst does not matter.
> >>>>>>> Queue_size to be set before DRIVER_OK like rest of the
> >>>>>>> registers, as all
> >>>>>> queues must be created before the driver_ok phase.
> >>>>>>> Queue_reset was last moment exception.
> >>>>>> create a queue? Nvidia specific?
> >>>>>>
> >>>>> Huh. No.
> >>>>> Do git log and realize what happened with queue_reset.
> >>>> You didn't answer the question, does the spec even has defined
> >>>> "create a
> >> vq"?
> >>> Enabled/created = tomato/tomato when discussing the spec in
> >>> non-normative
> >> email conversation.
> >>> It's irrelevant.
> >> Then lets not debate on this enable a vq or create a vq anymore
> >>> All I am saying is, when we know the limitations of the transport
> >>> and when industry is forwarding to not introduced more and more
> >>> on-die register
> >> for once in lifetime work of device migration, we just use the
> >> optimal command and queue interface that is native to virtio.
> >> PCI config space has its own limitations, and admin vq has its
> >> advantages, but that does not apply to all use cases.
> >>
> > There was a recent work done emulating the SR-IOV cap and allowing VM to
> enable SR-IOV in [1].
> > This is the option I mentioned few weeks ago.
> >
> > So with admin commands and admin virtqueues, even nested model will work
> using [1].
> >
> > [1]
> > https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov-offload-o
> > n-virtual-machines.html
> We should take this into consideration once it is standardized in the spec,
> maybe not now, there can always be many workarounds to solve one problem.
Sure, until that point the admin commands are able to suffice the need well.
And when the spec changes in transport occurs (if needed), current admin command and admin vq also fits very well that will follow above [1].

Thanks.
> >
> >> I don't want to repeat why I don't think admin vq is a good idea for
> >> migration again, we have already discussed on that.

Follow-Ups:
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>

References:
- [PATCH V2 0/6] introduce basic facilities for virito live migration
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Zhu Lingshan <lingshan.zhu@intel.com>
- RE: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>