OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, November 21, 2023 10:01 AM
> 
> On Fri, Nov 17, 2023 at 6:06âPM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Friday, November 17, 2023 3:32 PM
> > > To: Parav Pandit <parav@nvidia.com>; jasowang@redhat.com;
> > > mst@redhat.com; eperezma@redhat.com; cohuck@redhat.com;
> > > stefanha@redhat.com
> > > Cc: virtio-comment@lists.oasis-open.org
> > > Subject: Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci:
> > > implement VIRTIO_F_QUEUE_STATE
> > >
> > >
> > >
> > > On 11/16/2023 6:21 PM, Parav Pandit wrote:
> > > >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >> Sent: Thursday, November 16, 2023 3:45 PM
> > > >>
> > > >> On 11/16/2023 1:35 AM, Parav Pandit wrote:
> > > >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >>>> Sent: Monday, November 13, 2023 2:56 PM
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On 11/10/2023 8:31 PM, Parav Pandit wrote:
> > > >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >>>>>> Sent: Friday, November 10, 2023 1:22 PM
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On 11/9/2023 6:25 PM, Parav Pandit wrote:
> > > >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >>>>>>>> Sent: Thursday, November 9, 2023 3:39 PM
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On 11/9/2023 2:28 PM, Parav Pandit wrote:
> > > >>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >>>>>>>>>> Sent: Tuesday, November 7, 2023 3:02 PM
> > > >>>>>>>>>>
> > > >>>>>>>>>> On 11/6/2023 6:52 PM, Parav Pandit wrote:
> > > >>>>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >>>>>>>>>>>> Sent: Monday, November 6, 2023 2:57 PM
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On 11/6/2023 12:12 PM, Parav Pandit wrote:
> > > >>>>>>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > >>>>>>>>>>>>>> Sent: Monday, November 6, 2023 9:01 AM
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On 11/3/2023 11:50 PM, Parav Pandit wrote:
> > > >>>>>>>>>>>>>>>> From: virtio-comment@lists.oasis-open.org
> > > >>>>>>>>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of
> > > >>>>>>>>>>>>>>>> Zhu, Lingshan
> > > >>>>>>>>>>>>>>>> Sent: Friday, November 3, 2023 8:27 PM
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On 11/3/2023 7:35 PM, Parav Pandit wrote:
> > > >>>>>>>>>>>>>>>>>> From: Zhu Lingshan <lingshan.zhu@intel.com>
> > > >>>>>>>>>>>>>>>>>> Sent: Friday, November 3, 2023 4:05 PM
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> This patch adds two new le16 fields to common
> > > >>>>>>>>>>>>>>>>>> configuration structure to support
> > > >>>>>>>>>>>>>>>>>> VIRTIO_F_QUEUE_STATE in PCI transport
> > > >>>>>> layer.
> > > >>>>>>>>>>>>>>>>>> Signed-off-by: Zhu Lingshan
> > > >>>>>>>>>>>>>>>>>> <lingshan.zhu@intel.com>
> > > >>>>>>>>>>>>>>>>>> ---
> > > >>>>>>>>>>>>>>>>>>           transport-pci.tex | 18 ++++++++++++++++++
> > > >>>>>>>>>>>>>>>>>>           1 file changed, 18 insertions(+)
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> diff --git a/transport-pci.tex
> > > >>>>>>>>>>>>>>>>>> b/transport-pci.tex index
> > > >>>>>>>>>>>>>>>>>> a5c6719..3161519 100644
> > > >>>>>>>>>>>>>>>>>> --- a/transport-pci.tex
> > > >>>>>>>>>>>>>>>>>> +++ b/transport-pci.tex
> > > >>>>>>>>>>>>>>>>>> @@ -325,6 +325,10 @@ \subsubsection{Common
> > > >>>> configuration
> > > >>>>>>>>>>>>>> structure
> > > >>>>>>>>>>>>>>>>>> layout}\label{sec:Virtio Transport
> > > >>>>>>>>>>>>>>>>>>                   /* About the administration virtqueue. */
> > > >>>>>>>>>>>>>>>>>>                   le16 admin_queue_index;         /* read-only
> for
> > > >> driver
> > > >>>>>> */
> > > >>>>>>>>>>>>>>>>>>                   le16 admin_queue_num;         /* read-only
> for
> > > >> driver
> > > >>>>>> */
> > > >>>>>>>>>>>>>>>>>> +
> > > >>>>>>>>>>>>>>>>>> +  /* Virtqueue state */
> > > >>>>>>>>>>>>>>>>>> +        le16 queue_avail_state;         /* read-write */
> > > >>>>>>>>>>>>>>>>>> +        le16 queue_used_state;          /* read-write */
> > > >>>>>>>>>>>>>>>>> This tiny interface for 128 virtio net queues
> > > >>>>>>>>>>>>>>>>> through register read writes, does
> > > >>>>>>>>>>>>>>>> not work effectively.
> > > >>>>>>>>>>>>>>>>> There are inflight out of order descriptors for block
> also.
> > > >>>>>>>>>>>>>>>>> Hence toy registers like this do not work.
> > > >>>>>>>>>>>>>>>> Do you know there is a queue_select? Why this does
> > > >>>>>>>>>>>>>>>> not
> > > >> work?
> > > >>>>>>>>>>>>>>>> Do you know how other queue related fields work?
> > > >>>>>>>>>>>>>>> :)
> > > >>>>>>>>>>>>>>> Yes. If you notice queue_reset related critical spec
> > > >>>>>>>>>>>>>>> bug fix was done when it
> > > >>>>>>>>>>>>>> was introduced so that live migration can _actually_ work.
> > > >>>>>>>>>>>>>>> When queue_select is done for 128 queues serially,
> > > >>>>>>>>>>>>>>> it take a lot of time to
> > > >>>>>>>>>>>>>> read those slow register interface for this +
> > > >>>>>>>>>>>>>> inflight descriptors +
> > > >>>>>> more.
> > > >>>>>>>>>>>>>> interesting, virtio work in this pattern for many years,
> right?
> > > >>>>>>>>>>>>> All these years 400Gbps and 800Gbps virtio was not
> > > >>>>>>>>>>>>> present, number of
> > > >>>>>>>>>>>> queues were not in hw.
> > > >>>>>>>>>>>> The registers are control path in config space, how
> > > >>>>>>>>>>>> 400G or 800G
> > > >>>>>> affect??
> > > >>>>>>>>>>> Because those are the one in practice requires large
> > > >>>>>>>>>>> number of
> > > VQs.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> You are asking per VQ register commands to modify things
> > > >>>>>>>>>>> dynamically via
> > > >>>>>>>>>> this one vq at a time, serializing all the operations.
> > > >>>>>>>>>>> It does not scale well with high q count.
> > > >>>>>>>>>> This is not dynamically, it only happens when SUSPEND and
> > > RESUME.
> > > >>>>>>>>>> This is the same mechanism how virtio initialize a
> > > >>>>>>>>>> virtqueue, working for many years.
> > > >>>>>>>>> No. when virtio driver initializes it for the first time,
> > > >>>>>>>>> there is no active traffic
> > > >>>>>>>> that gets lost.
> > > >>>>>>>>> This is because the interface is not yet up and not part
> > > >>>>>>>>> of the network
> > > >>>> yet.
> > > >>>>>>>>> The resume must be fast enough, because the remote node is
> > > >>>>>>>>> sending
> > > >>>>>>>> packets.
> > > >>>>>>>>> Hence it is different from driver init time queue enable.
> > > >>>>>>>> I am not sure any packets arrive before a link announce at
> > > >>>>>>>> the destination
> > > >>>>>> side.
> > > >>>>>>> I think it can.
> > > >>>>>>> Because there is no notification of member device link down
> > > >>>>>>> intimation to
> > > >>>>>> remote side.
> > > >>>>>>> The L4 and L5 protocols have no knowledge that node which
> > > >>>>>>> they are
> > > >>>>>> interacting is behind some layers of switches.
> > > >>>>>>> So keeping this time low is desired.
> > > >>>>>> The NIC should broad cast itself first, so that other peers
> > > >>>>>> in the network know(for example its mac to route it) how to
> > > >>>>>> send a message to
> > > >> it.
> > > >>>>>> This is necessary, for example VIRTIO_NET_F_GUEST_ANNOUNCE,
> > > >>>>>> similar mechanism work for in-marketing productions for years.
> > > >>>>>>
> > > >>>>>> This is out of the topic anyway.
> > > >>>>>>>>>>>> See the virtio common cfg, you will find the max number
> > > >>>>>>>>>>>> of vqs is there, num_queues.
> > > >>>>>>>>>>> :)
> > > >>>>>>>>>>> Sure. those values at high q count affects.
> > > >>>>>>>>>> the driver need to initialize them anyway.
> > > >>>>>>>>> That is before the traffic starts from remote end.
> > > >>>>>>>> see above, that needs a link announce and this is after
> > > >>>>>>>> re-initialization
> > > >>>>>>>>>>>>> Device didnât support LM.
> > > >>>>>>>>>>>>> Many limitations existed all these years and TC is
> > > >>>>>>>>>>>>> improving and expanding
> > > >>>>>>>>>>>> them.
> > > >>>>>>>>>>>>> So all these years do not matter.
> > > >>>>>>>>>>>> Not sure what are you talking about, haven't we
> > > >>>>>>>>>>>> initialize the device and vqs in config space for
> > > >>>>>>>>>>>> years?????? What's wrong with this
> > > >>>>>>>> mechanism?
> > > >>>>>>>>>>>> Are you questioning virito-pci fundamentals???
> > > >>>>>>>>>>> Donât point to in-efficient past to establish similar
> > > >>>>>>>>>>> in-efficient
> > > future.
> > > >>>>>>>>>> interesting, you know this is a one-time thing, right?
> > > >>>>>>>>>> and you are aware of this has been there for years.
> > > >>>>>>>>>>>>>>>> Like how to set a queue size and enable it?
> > > >>>>>>>>>>>>>>> Those are meant to be used before DRIVER_OK stage as
> > > >>>>>>>>>>>>>>> they are init time
> > > >>>>>>>>>>>>>> registers.
> > > >>>>>>>>>>>>>>> Not to keep abusing them..
> > > >>>>>>>>>>>>>> don't you need to set queue_size at the destination side?
> > > >>>>>>>>>>>>> No.
> > > >>>>>>>>>>>>> But the src/dst does not matter.
> > > >>>>>>>>>>>>> Queue_size to be set before DRIVER_OK like rest of the
> > > >>>>>>>>>>>>> registers, as all
> > > >>>>>>>>>>>> queues must be created before the driver_ok phase.
> > > >>>>>>>>>>>>> Queue_reset was last moment exception.
> > > >>>>>>>>>>>> create a queue? Nvidia specific?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>> Huh. No.
> > > >>>>>>>>>>> Do git log and realize what happened with queue_reset.
> > > >>>>>>>>>> You didn't answer the question, does the spec even has
> > > >>>>>>>>>> defined "create a
> > > >>>>>>>> vq"?
> > > >>>>>>>>> Enabled/created = tomato/tomato when discussing the spec
> > > >>>>>>>>> in non-normative
> > > >>>>>>>> email conversation.
> > > >>>>>>>>> It's irrelevant.
> > > >>>>>>>> Then lets not debate on this enable a vq or create a vq
> > > >>>>>>>> anymore
> > > >>>>>>>>> All I am saying is, when we know the limitations of the
> > > >>>>>>>>> transport and when industry is forwarding to not
> > > >>>>>>>>> introduced more and more on-die register
> > > >>>>>>>> for once in lifetime work of device migration, we just use
> > > >>>>>>>> the optimal command and queue interface that is native to virtio.
> > > >>>>>>>> PCI config space has its own limitations, and admin vq has
> > > >>>>>>>> its advantages, but that does not apply to all use cases.
> > > >>>>>>>>
> > > >>>>>>> There was a recent work done emulating the SR-IOV cap and
> > > >>>>>>> allowing VM to
> > > >>>>>> enable SR-IOV in [1].
> > > >>>>>>> This is the option I mentioned few weeks ago.
> > > >>>>>>>
> > > >>>>>>> So with admin commands and admin virtqueues, even nested
> > > >>>>>>> model will work
> > > >>>>>> using [1].
> > > >>>>>>> [1]
> > > >>>>>>> https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov
> > > >>>>>>> -off
> > > >>>>>>> lo
> > > >>>>>>> ad
> > > >>>>>>> -o
> > > >>>>>>> n-virtual-machines.html
> > > >>>>>> We should take this into consideration once it is
> > > >>>>>> standardized in the spec, maybe not now, there can always be
> > > >>>>>> many workarounds to solve one
> > > >>>> problem.
> > > >>>>> Sure, until that point the admin commands are able to suffice
> > > >>>>> the need
> > > >> well.
> > > >>>>> And when the spec changes in transport occurs (if needed),
> > > >>>>> current admin
> > > >>>> command and admin vq also fits very well that will follow above [1].
> > > >>>> we have pointed lots of problems for admin vq based live
> > > >>>> migration proposal, I won't repeat them here
> > > >>> I donât see any.
> > > >>> Nested is already solved using above.
> > > >> I don't see how, do you mind to work out the patches?
> > > > Once the base series is completed, nested cases can be addressed.
> > > > I wont be able to work on the patches for it until we finish for
> > > > the first level
> > > virtualization.
> > > As you know, nested is supported well in current virtio, so please don't
> break it.
> >
> > And same comment repeats. ð
> > Expect same response...
> > Sorry, no virtio specification does not support device migration today.
> > Nothing is broken by adding new features.
> >
> > Above [1] has the right proposal that Jason's paper pointed out. Please use
> it.
> 
> I was involved in the design in [1]. And I don't see a connection to the
> dicussion here
> 
> 1) It is based on vDPA in L0
> 2) It doesn't address the nesting issue, it requires a proper design in the virtio
> spec to support migration in the nesting layer.

Nothing prevents [1] to be done without vdpa.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]