OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration


On Wed, Nov 22, 2023 at 2:05âPM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, November 22, 2023 10:57 AM
> >
> > On Wed, Nov 22, 2023 at 12:32âAM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, November 21, 2023 12:54 PM
> > > >
> > > > On Thu, Nov 16, 2023 at 1:28âPM Parav Pandit <parav@nvidia.com>
> > wrote:
> > > > >
> > > > >
> > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > Sent: Thursday, November 16, 2023 9:50 AM
> > > > > >
> > > > > > On Thu, Nov 16, 2023 at 1:39âAM Parav Pandit <parav@nvidia.com>
> > > > wrote:
> > > > > > >
> > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > Sent: Monday, November 13, 2023 9:03 AM
> > > > > > > >
> > > > > > > > On Thu, Nov 9, 2023 at 2:25âPM Parav Pandit
> > > > > > > > <parav@nvidia.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > Sent: Tuesday, November 7, 2023 9:35 AM
> > > > > > > > > >
> > > > > > > > > > On Mon, Nov 6, 2023 at 3:05âPM Parav Pandit
> > > > > > > > > > <parav@nvidia.com>
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > > > > > Sent: Monday, November 6, 2023 12:05 PM
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Nov 2, 2023 at 2:10âPM Parav Pandit
> > > > > > > > > > > > <parav@nvidia.com>
> > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> >
> > [...]
> >
> > > > > >
> > > > > I disagree.
> > > > > There are two different functionalities.
> > > > >
> > > > > Functionality_1: explicit ask for read or write
> > > > > Functionality_2: read what has changed
> > > >
> > > > This needs to be justified. I won't repeat the questions again here.
> > > >
> > > As explained the use case in theory of operation already.
> > >
> > > > >
> > > > > Should one merge 1 and 2 and complicate the command?
> > > > > I prefer not to.
> > > >
> > > > Again there're functional duplications. E.g your command duplicates
> > > > common_cfg for sure.
> > > Nop. it is not.
> > > Common cfg is accessed directly by guest member driver.
> >
> > It can be accessed directly, if we have adminq per VF.
> Sure, instead I proposed the cvq as there is no need to burn another queue.

Please, explain how CVQ is related here.

>
> >
> > >
> > > >
> > > > >
> > > > > Now having two different commands help for debugging to
> > > > > differentiate between mgmt. commands and guest initiated commands.
> > > > > :)
> > > > >
> > > > > > >
> > > > > > > > Guest configure the following one by one:
> > > > > > > >
> > > > > > > > 1) vq size
> > > > > > > > 2) vq addresses
> > > > > > > > 3) MSI-X
> > > > > > > >
> > > > > > > > etc?
> > > > > > > >
> > > > > > > I think you interpreted "incremental" differently than I described.
> > > > > > > In the device context read, the incremental is:
> > > > > > >
> > > > > > > If the hypervisor driver has read the device context twice,
> > > > > > > the second read
> > > > > > won't return any new data if nothing changed.
> > > > > >
> > > > > > See above.
> > > > > >
> > > > > Yeah, two separate commands needed.
> > > > >
> > > > > > > For example, if RSS configuration didnât change between two
> > > > > > > reads, the
> > > > > > second read wont return the TLV for RSS Context.
> > > > > > >
> > > > > > > While for transport the need is, when guest asked, one device
> > > > > > > must read it
> > > > > > regardless of the change.
> > > > > > >
> > > > > > > So notion of incremental is not by address, but by the value.
> > > > > > >
> > > > > > > > > For example, VQ configuration is exchanged once between
> > > > > > > > > src and
> > > > dst.
> > > > > > > > > But VQ avail and used index may be updated multiple times.
> > > > > > > >
> > > > > > > > If it can work with multiple times of updating, why can't it
> > > > > > > > work if we just update it once?
> > > > > > > Functionally it can work.
> > > > > >
> > > > > > I think you answer yourself.
> > > > > >
> > > > > Yes, I donât like abuse of command.
> > > >
> > > > How did you define abuse or can spec ever need to define that?
> > > I donât have any different definition than dictionary definition for
> > > abuse. :)
> >
> > "Abuse" is pretty subjective. Spec have driver normative, please explain how
> > and if you can use that.
> >
> :)
>
> So be it.
>
> Spec does not need to define the dictionary.
>
> For explicit read and write simple read/write commands are right to me.
>
> > >
> > > >
> > > > >
> > > > > > > Performance wise, one does not want to update multiple times,
> > > > > > > unless there
> > > > > > is a change.
> > > > > > >
> > > > > > > Read as explained above is not meant to return same content again.
> > > > > > >
> > > > > > > >
> > > > > > > > > So here hypervisor do not want to read any specific set of
> > > > > > > > > fields and
> > > > > > > > hypervisor is not parsing them either.
> > > > > > > > > It is just a byte stream for it.
> > > > > > > >
> > > > > > > > Firstly, spec must define the device context format, so
> > > > > > > > hypervisor can understand which byte is what otherwise you
> > > > > > > > can't maintain migration compatibility.
> > > > > > > Device context is defined already in the latest version.
> > > > > > >
> > > > > > > > Secondly, you can't mandate how the hypervisor is written.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > As opposed to that, in case of transport, the guest
> > > > > > > > > explicitly asks to read or
> > > > > > > > write specific bytes.
> > > > > > > > > Therefore, it is not incremental.
> > > > > > > >
> > > > > > > > I'm totally lost. Which part of the transport is not incremental?
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Additionally, if hypervisor has put the trap on virtio
> > > > > > > > > config, and because the memory device already has the
> > > > > > > > > interface for virtio config,
> > > > > > > > >
> > > > > > > > > Hypervisor can directly write/read from the virtual config
> > > > > > > > > to the member's
> > > > > > > > config space, without going through the device context, right?
> > > > > > > >
> > > > > > > > If it can do it or it can choose to not. I don't see how it
> > > > > > > > is related to the discussion here.
> > > > > > > >
> > > > > > > It is. I donât see a point of hypervisor not using the native
> > > > > > > interface provided
> > > > > > by the member device.
> > > > > >
> > > > > > It really depends on the case, and I see how it duplicates with
> > > > > > the functionality that is provided by both:
> > > > > >
> > > > > > 1) The existing PCI transport
> > > > > >
> > > > > > or
> > > > > >
> > > > > > 2) The transport virtqueue
> > > > > >
> > > > > I would like to conclude that we disagree in our approaches.
> > > > > PCI transport is for member device to directly communicate from
> > > > > guest
> > > > driver to the device.
> > > > > This is uniform across PF, VFs, SIOV.
> > > >
> > > > For "PCi transport" did you mean the one defined in spec? If yes,
> > > > how can it work with SIOV with what you're saying here (a direct
> > > > communication channel)?
> > > >
> > > SIOV device may have same MMIO as VF.
> >
> > We circle back, SIOV is for scalability. If you claim registers don't scale, SVIO via
> > MMIO doesn't scale either.
> Hence, any new and slow things to stay off the MMIO.
> And for currently defined things, Lingshan will show the performance numbers of why they should be transported via a virtqeueue.
>
> >
> > >
> > > > >
> > > > > Admin commands are transport independent and their task is device
> > > > migration.
> > > > > One is not replacing the other.
> > > > >
> > > > > Transport virtqueue will never transport driver notifications,
> > > > > hence it does
> > > > not qualify at "transport".
> > > >
> > > > Another double standard.
> > > I disagree. You coined the term transport vq, so stand behind it to transport
> > everything.
> >
> > Nope, it works like other transport. It doesn't aim to replace any existing
> > transport.
> >
> > >
> > > >
> > > > MMIO will never transport device notification, hence it does not
> > > > qualify as "transport"?
> > > >
> > > How does interrupts work?
> >
> > It depends on the platform, no?
> I donât know, how does a MMIO virtio hw device deliver an interrupt for x86 cpu?

It uses platform or architecture specific mechanisms.

>
> >
> > > Seems like missing basic functionality in transport.
> >
> > Not necessarily the charge of a transport. Virtio transport can't fly without a
> > platform.
> What is the equivalent of msix in mmio?

MSI is not unique to PCI, it can be used for MMIO as well if the
platform or architecture supports that.

> >
> > >
> > > > >
> > > > > For the vdpa case, there is no need for extra admin commands as
> > > > > the
> > > > mediation layer can directly use the interface available from the
> > > > member device itself.
> > > > >
> > > > > You continue to want to overload admin commands for dual purpose,
> > > > > does
> > > > not make sense to me.
> > > > >
> > > > > > >
> > > > > > >  > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > it is not good idea to overload management
> > > > > > > > > > > > > commands with actual run time
> > > > > > > > > > > > guest commands.
> > > > > > > > > > > > > The device context read writes are largely for
> > > > > > > > > > > > > incremental
> > > > updates.
> > > > > > > > > > > >
> > > > > > > > > > > > It doesn't matter if it is incremental or not but
> > > > > > > > > > > >
> > > > > > > > > > > It does because you want different functionality only
> > > > > > > > > > > for purpose of backward
> > > > > > > > > > compatibility.
> > > > > > > > > > > That also if the device does not offer them as portion
> > > > > > > > > > > of MMIO
> > > > BAR.
> > > > > > > > > >
> > > > > > > > > > I don't see how it is related to the "incremental part".
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 1) the function is there
> > > > > > > > > > > > 2) hypervisor can use that function if they want and
> > > > > > > > > > > > virtio
> > > > > > > > > > > > (spec) can't forbid that
> > > > > > > > > > > >
> > > > > > > > > > > It is not about forbidding or supporting.
> > > > > > > > > > > Its about what functionality to use for management
> > > > > > > > > > > plane and guest
> > > > > > > > plane.
> > > > > > > > > > > Both have different needs.
> > > > > > > > > >
> > > > > > > > > > People can have different views, there's nothing we can
> > > > > > > > > > prevent a hypervisor from using it as a transport as far as I can
> > see.
> > > > > > > > > For device context write command, it can be used (or
> > > > > > > > > probably
> > > > > > > > > abused) to do
> > > > > > > > write but I fail to see why to use it.
> > > > > > > >
> > > > > > > > The function is there, you can't prevent people from doing that.
> > > > > > > >
> > > > > > > One can always mess up itself. :) It is not prevented. It is
> > > > > > > just not right way to use the interface.
> > > > > > >
> > > > > > > > > Because member device already has the interface to do
> > > > > > > > > config read/write and
> > > > > > > > it is accessible to the hypervisor.
> > > > > > > >
> > > > > > > > Well, it looks self-contradictory again. Are you saying
> > > > > > > > another set of commands that is similar to device context is
> > > > > > > > needed for non-PCI
> > > > > > transport?
> > > > > > > >
> > > > > > > All these non pci transport discussion is just meaning less.
> > > > > > > Let MMIO bring the concept of member device at that point
> > > > > > > something make
> > > > > > sense to discuss.
> > > > > >
> > > > > > It's not necessarily MMIO. For example the SIOV, which I don't
> > > > > > think can use the existing PCI transport.
> > > > > >
> > > > > > > PCI SIOV is also the PCI device at the end.
> > > > > >
> > > > > > We don't want to end up with two sets of commands to save/load
> > > > > > SRIOV and SIOV at least.
> > > > > >
> > > > > This proposal ensures that SRIOV and SIOV devices are treated equally.
> > > >
> > > > How? Did you mean your proposal can work for SIOV? What's the
> > > > transport then?
> > > Yes. All majority of the device contexts should work for SIOV device as_is.
> > > Member id would be different.
> > > Some device context TLVs may be new as SIOV may have some
> > simplifications as it may not have the giant register space like current one.
> >
> > You only explain the migration part but not the transport part.
> >
> Because SIOV is still under construction in community.

Transport virtqueue is not just designed for SIOV.

> There is no point of defining SIOV transport for some half-cooked spec.

Then let's not claim your proposal suits for TDISP. The live migration
of TDISP is even far beyond half-cooked.

Thanks



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]