OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Re: [PATCH V2 4/6] virtio-pci: implement VIRTIO_F_QUEUE_STATE


On Wed, Nov 22, 2023 at 09:32:53AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 11/17/2023 6:45 PM, Michael S. Tsirkin wrote:
> > On Fri, Nov 17, 2023 at 06:02:14PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 11/16/2023 6:21 PM, Parav Pandit wrote:
> > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > Sent: Thursday, November 16, 2023 3:45 PM
> > > > > 
> > > > > On 11/16/2023 1:35 AM, Parav Pandit wrote:
> > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > > > Sent: Monday, November 13, 2023 2:56 PM
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On 11/10/2023 8:31 PM, Parav Pandit wrote:
> > > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > > > > > Sent: Friday, November 10, 2023 1:22 PM
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On 11/9/2023 6:25 PM, Parav Pandit wrote:
> > > > > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > > > > > > > Sent: Thursday, November 9, 2023 3:39 PM
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On 11/9/2023 2:28 PM, Parav Pandit wrote:
> > > > > > > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > > > > > > > > > Sent: Tuesday, November 7, 2023 3:02 PM
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On 11/6/2023 6:52 PM, Parav Pandit wrote:
> > > > > > > > > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > > > > > > > > > > > Sent: Monday, November 6, 2023 2:57 PM
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On 11/6/2023 12:12 PM, Parav Pandit wrote:
> > > > > > > > > > > > > > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > > > > > > > > > > > > > Sent: Monday, November 6, 2023 9:01 AM
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > On 11/3/2023 11:50 PM, Parav Pandit wrote:
> > > > > > > > > > > > > > > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > > > > > > > > > > > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
> > > > > > > > > > > > > > > > > > > Lingshan
> > > > > > > > > > > > > > > > > > > Sent: Friday, November 3, 2023 8:27 PM
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > On 11/3/2023 7:35 PM, Parav Pandit wrote:
> > > > > > > > > > > > > > > > > > > > > From: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > > > > > > > > > > > > > > > > > Sent: Friday, November 3, 2023 4:05 PM
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > This patch adds two new le16 fields to common
> > > > > > > > > > > > > > > > > > > > > configuration structure to support VIRTIO_F_QUEUE_STATE
> > > > > > > > > > > > > > > > > > > > > in PCI transport
> > > > > > > > > layer.
> > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > > > > > >            transport-pci.tex | 18 ++++++++++++++++++
> > > > > > > > > > > > > > > > > > > > >            1 file changed, 18 insertions(+)
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > diff --git a/transport-pci.tex b/transport-pci.tex
> > > > > > > > > > > > > > > > > > > > > index
> > > > > > > > > > > > > > > > > > > > > a5c6719..3161519 100644
> > > > > > > > > > > > > > > > > > > > > --- a/transport-pci.tex
> > > > > > > > > > > > > > > > > > > > > +++ b/transport-pci.tex
> > > > > > > > > > > > > > > > > > > > > @@ -325,6 +325,10 @@ \subsubsection{Common
> > > > > > > configuration
> > > > > > > > > > > > > > > > > structure
> > > > > > > > > > > > > > > > > > > > > layout}\label{sec:Virtio Transport
> > > > > > > > > > > > > > > > > > > > >                    /* About the administration virtqueue. */
> > > > > > > > > > > > > > > > > > > > >                    le16 admin_queue_index;         /* read-only for
> > > > > driver
> > > > > > > > > */
> > > > > > > > > > > > > > > > > > > > >                    le16 admin_queue_num;         /* read-only for
> > > > > driver
> > > > > > > > > */
> > > > > > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > > > > > +	/* Virtqueue state */
> > > > > > > > > > > > > > > > > > > > > +        le16 queue_avail_state;         /* read-write */
> > > > > > > > > > > > > > > > > > > > > +        le16 queue_used_state;          /* read-write */
> > > > > > > > > > > > > > > > > > > > This tiny interface for 128 virtio net queues through
> > > > > > > > > > > > > > > > > > > > register read writes, does
> > > > > > > > > > > > > > > > > > > not work effectively.
> > > > > > > > > > > > > > > > > > > > There are inflight out of order descriptors for block also.
> > > > > > > > > > > > > > > > > > > > Hence toy registers like this do not work.
> > > > > > > > > > > > > > > > > > > Do you know there is a queue_select? Why this does not
> > > > > work?
> > > > > > > > > > > > > > > > > > > Do you know how other queue related fields work?
> > > > > > > > > > > > > > > > > > :)
> > > > > > > > > > > > > > > > > > Yes. If you notice queue_reset related critical spec bug
> > > > > > > > > > > > > > > > > > fix was done when it
> > > > > > > > > > > > > > > > > was introduced so that live migration can _actually_ work.
> > > > > > > > > > > > > > > > > > When queue_select is done for 128 queues serially, it take
> > > > > > > > > > > > > > > > > > a lot of time to
> > > > > > > > > > > > > > > > > read those slow register interface for this + inflight
> > > > > > > > > > > > > > > > > descriptors +
> > > > > > > > > more.
> > > > > > > > > > > > > > > > > interesting, virtio work in this pattern for many years, right?
> > > > > > > > > > > > > > > > All these years 400Gbps and 800Gbps virtio was not present,
> > > > > > > > > > > > > > > > number of
> > > > > > > > > > > > > > > queues were not in hw.
> > > > > > > > > > > > > > > The registers are control path in config space, how 400G or
> > > > > > > > > > > > > > > 800G
> > > > > > > > > affect??
> > > > > > > > > > > > > > Because those are the one in practice requires large number of VQs.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > You are asking per VQ register commands to modify things
> > > > > > > > > > > > > > dynamically via
> > > > > > > > > > > > > this one vq at a time, serializing all the operations.
> > > > > > > > > > > > > > It does not scale well with high q count.
> > > > > > > > > > > > > This is not dynamically, it only happens when SUSPEND and RESUME.
> > > > > > > > > > > > > This is the same mechanism how virtio initialize a virtqueue,
> > > > > > > > > > > > > working for many years.
> > > > > > > > > > > > No. when virtio driver initializes it for the first time, there
> > > > > > > > > > > > is no active traffic
> > > > > > > > > > > that gets lost.
> > > > > > > > > > > > This is because the interface is not yet up and not part of the
> > > > > > > > > > > > network
> > > > > > > yet.
> > > > > > > > > > > > The resume must be fast enough, because the remote node is
> > > > > > > > > > > > sending
> > > > > > > > > > > packets.
> > > > > > > > > > > > Hence it is different from driver init time queue enable.
> > > > > > > > > > > I am not sure any packets arrive before a link announce at the
> > > > > > > > > > > destination
> > > > > > > > > side.
> > > > > > > > > > I think it can.
> > > > > > > > > > Because there is no notification of member device link down
> > > > > > > > > > intimation to
> > > > > > > > > remote side.
> > > > > > > > > > The L4 and L5 protocols have no knowledge that node which they are
> > > > > > > > > interacting is behind some layers of switches.
> > > > > > > > > > So keeping this time low is desired.
> > > > > > > > > The NIC should broad cast itself first, so that other peers in the
> > > > > > > > > network know(for example its mac to route it) how to send a message to
> > > > > it.
> > > > > > > > > This is necessary, for example VIRTIO_NET_F_GUEST_ANNOUNCE, similar
> > > > > > > > > mechanism work for in-marketing productions for years.
> > > > > > > > > 
> > > > > > > > > This is out of the topic anyway.
> > > > > > > > > > > > > > > See the virtio common cfg, you will find the max number of
> > > > > > > > > > > > > > > vqs is there, num_queues.
> > > > > > > > > > > > > > :)
> > > > > > > > > > > > > > Sure. those values at high q count affects.
> > > > > > > > > > > > > the driver need to initialize them anyway.
> > > > > > > > > > > > That is before the traffic starts from remote end.
> > > > > > > > > > > see above, that needs a link announce and this is after
> > > > > > > > > > > re-initialization
> > > > > > > > > > > > > > > > Device didnât support LM.
> > > > > > > > > > > > > > > > Many limitations existed all these years and TC is improving
> > > > > > > > > > > > > > > > and expanding
> > > > > > > > > > > > > > > them.
> > > > > > > > > > > > > > > > So all these years do not matter.
> > > > > > > > > > > > > > > Not sure what are you talking about, haven't we initialize
> > > > > > > > > > > > > > > the device and vqs in config space for years?????? What's
> > > > > > > > > > > > > > > wrong with this
> > > > > > > > > > > mechanism?
> > > > > > > > > > > > > > > Are you questioning virito-pci fundamentals???
> > > > > > > > > > > > > > Donât point to in-efficient past to establish similar in-efficient future.
> > > > > > > > > > > > > interesting, you know this is a one-time thing, right?
> > > > > > > > > > > > > and you are aware of this has been there for years.
> > > > > > > > > > > > > > > > > > > Like how to set a queue size and enable it?
> > > > > > > > > > > > > > > > > > Those are meant to be used before DRIVER_OK stage as they
> > > > > > > > > > > > > > > > > > are init time
> > > > > > > > > > > > > > > > > registers.
> > > > > > > > > > > > > > > > > > Not to keep abusing them..
> > > > > > > > > > > > > > > > > don't you need to set queue_size at the destination side?
> > > > > > > > > > > > > > > > No.
> > > > > > > > > > > > > > > > But the src/dst does not matter.
> > > > > > > > > > > > > > > > Queue_size to be set before DRIVER_OK like rest of the
> > > > > > > > > > > > > > > > registers, as all
> > > > > > > > > > > > > > > queues must be created before the driver_ok phase.
> > > > > > > > > > > > > > > > Queue_reset was last moment exception.
> > > > > > > > > > > > > > > create a queue? Nvidia specific?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Huh. No.
> > > > > > > > > > > > > > Do git log and realize what happened with queue_reset.
> > > > > > > > > > > > > You didn't answer the question, does the spec even has defined
> > > > > > > > > > > > > "create a
> > > > > > > > > > > vq"?
> > > > > > > > > > > > Enabled/created = tomato/tomato when discussing the spec in
> > > > > > > > > > > > non-normative
> > > > > > > > > > > email conversation.
> > > > > > > > > > > > It's irrelevant.
> > > > > > > > > > > Then lets not debate on this enable a vq or create a vq anymore
> > > > > > > > > > > > All I am saying is, when we know the limitations of the
> > > > > > > > > > > > transport and when industry is forwarding to not introduced more
> > > > > > > > > > > > and more on-die register
> > > > > > > > > > > for once in lifetime work of device migration, we just use the
> > > > > > > > > > > optimal command and queue interface that is native to virtio.
> > > > > > > > > > > PCI config space has its own limitations, and admin vq has its
> > > > > > > > > > > advantages, but that does not apply to all use cases.
> > > > > > > > > > > 
> > > > > > > > > > There was a recent work done emulating the SR-IOV cap and allowing
> > > > > > > > > > VM to
> > > > > > > > > enable SR-IOV in [1].
> > > > > > > > > > This is the option I mentioned few weeks ago.
> > > > > > > > > > 
> > > > > > > > > > So with admin commands and admin virtqueues, even nested model
> > > > > > > > > > will work
> > > > > > > > > using [1].
> > > > > > > > > > [1]
> > > > > > > > > > https://netdevconf.info/0x17/sessions/talk/unleashing-sr-iov-offlo
> > > > > > > > > > ad
> > > > > > > > > > -o
> > > > > > > > > > n-virtual-machines.html
> > > > > > > > > We should take this into consideration once it is standardized in
> > > > > > > > > the spec, maybe not now, there can always be many workarounds to
> > > > > > > > > solve one
> > > > > > > problem.
> > > > > > > > Sure, until that point the admin commands are able to suffice the need
> > > > > well.
> > > > > > > > And when the spec changes in transport occurs (if needed), current
> > > > > > > > admin
> > > > > > > command and admin vq also fits very well that will follow above [1].
> > > > > > > we have pointed lots of problems for admin vq based live migration
> > > > > > > proposal, I won't repeat them here
> > > > > > I donât see any.
> > > > > > Nested is already solved using above.
> > > > > I don't see how, do you mind to work out the patches?
> > > > Once the base series is completed, nested cases can be addressed.
> > > > I wont be able to work on the patches for it until we finish for the first level virtualization.
> > > As you know, nested is supported well in current virtio, so please don't
> > > break it.
> > So for nesting, it seems cleaner to support sending commands through
> > device itself.
> I guess this requires per-VF admin vq or some agents & tricks.

I suggested a gateway in the VF for this. Really more or less like what
you did for write tracking except use the admin command format.
We'll need a new group type which just includes device itself.

> > You aren't going to fit VQ state in a 16 bit register in
> > the general case though, and will have to resort to DMA.
> Yes, at least we need in-flight descriptors tracking.
> Still working with Eugenio for this feature.
> > And if you are
> > doing that then please just use the admin command format (does not have
> > to be a VQ) and then we can all make peace finally.
> > 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]