OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, October 19, 2023 2:01 PM
> 
> On Thu, Oct 19, 2023 at 07:30:09AM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Thursday, October 19, 2023 12:05 PM
> > >
> > > On Thu, Oct 19, 2023 at 05:31:37AM +0000, Parav Pandit wrote:
> > > > > How could we make any agreement without an accurate the
> > > > > definition of "passthrough" who is a key to understand each other?
> > > >
> > > > I replied few times in past emails but since those email threads
> > > > are so long, it
> > > is easy to miss out.
> > > >
> > > > Passthrough definition:
> > > > a. virtio member device mapped to the guest vm b. only pci config
> > > > space and msix of a member device is intercepted by hypervisor.
> > > > c. virtio config space, virtio cvqs, data vqs of a member device
> > > > is directly
> > > accessed by the guest vm without intercepted by the hypervisor.
> > > >
> > > > (Why b?, no grand reason, it is how the hypervisors are working
> > > > where to
> > > integrate the virtio member device to).
> > >
> > > I think it's a reasonable use-case, though of course not at all the
> > > only way to design a system.
> > Sure, there are more ways to bisect the device, specially when underlying
> device is not a virtio device.
> > But one can continue bisecting virtio as well as you listed below.
> > > Some more ways:
> > > 2- intercept everything except data vqs and cvqs
> > > 	I think this is a reasonable way to build the system and has a bunch
> > > 	of advantages short term. The main disadvantage as compared to
> > > 	passthrough is the need to keep config space coherent with
> > > 	device operation - the way to do it is device specific and
> > > 	might get fragile.
> > >
> > Yes, I agree it has short term advantages.
> > This is not future proof as you listed.
> >
> > > 4- intercept everything except data vqs
> > > 	Here we get another problem in isolating some vqs but not
> > >         others. the problem becomes bigger is that you also
> > > 	need to communicate control vq to the device.
> > >
> > Yes. for non virtio device vendors have easy way to support.
> > We supported this for mlx5 devices.
> >
> > > also, with both of the above options, we have a question of how are
> > > we communicating with the device to keep control path and data path
> > > in sync when device's dma is mapped to guest.
> > > using PASIDs for isolation might work but again, support is far from
> > > universal so we can't really assume it as the only way in the spec.
> > >
> > Right.
> >
> > > Absent PASID the popular way seems to be shadow vq which basically
> > > does
> > >
> > > 4- software intercept for everything
> > >        clearly that's a lot of CPU overhead, I do not think we can focus on that
> > >        as the only way in the spec, though some hypervisors might
> > >        already have a lot of migration overhead to the point where
> > >        virtio can afford any amount of overhead and it won't be
> > >        measureable.
> > >
> > >
> > > I also note some or all of the intercepts can always come and go.
> > > For example, a common setup is that if target VCPUs are running then
> > > IOMMU will inject interrupts directly into guest - if not you
> > > generally trap to hypervisor. Similarly, shadow vq might be active just
> temporarily.
> > >
> > > Which approach is best? I feel ideally virtio would find ways to
> > > support them all rather than deciding on a policy in the spec.
> >
> > Cooking all the modes seems frankly very daunting to me specially when
> > there is no existing software stack to consume all modes and no device
> > vendor to sign of for _all_ variations.
> 
> Not addressing all the modes.  We are building components not stacks.
> Components need to be reusable not stack specific.
> 
> Was the whole admin command interface with its levels of indirection a design
> mistake then? It was designed exactly to support all kind of models.
Admin vq for multiple use cases including device migration demonstrates that it is a good fit.
SR-IOV, SIOV will be able to utilize for device migration, provisioning, legacy and more.

> >
> > To me, two stacks are practical and common to target at beginning.
> > i.e.
> > 1. passthrough mode
> >
> > 2. #2 above,
> > I had real technical difficulty to make #2 practically work and build a scalable
> device and have converged api with #1.
> > The option we explored to have admin command in some register of the VF
> specific for #2 is partially fine targeted for use case #2 only.
> 
> Right. So - a way to send admin commands to a VF directly, perhaps in config
> space? Do we need more than PA+PASID+some flags?
> Want to try to write something like this up?
> 
It cannot be in the PCI 4K config space for sure.
It must reside in the virtio config space.

I am sure that this is used for passthrough mode of #1.
So, can you please confirm to write this up for mode #2 only?

> > A variation of that for the member device, there is owner device, hence
> admin command on the AQ can be used.
> >
> > If we can converge on common virtio interface between #1 and #2, great.
> > If we cannot be due to technical issues, we shouldn't step on each other's
> toes, instead build the two interfaces for two different use cases overcoming its
> own technical challenges.
> >
> > And when in future, someone want to implement different kind of bisections,
> they can propose the extensions.
> 
> Not good at all, this means the interface is very narrow.
> Your "propose an extension" just doesn't work practically.
> It takes years for things to be widely deployed in the field, by the time they are
> there are more use-cases.

We usually see it getting deployed in < 1 year time with new spec advancement pace for many features.
Building something for unreasonable amount of time without use case results in missing the immediate deployments that happens in 2024 to 2027 of 1.4 spec time frame.

> We need something universal and admin commands were supposed to be just
> this.
I don't see a universal solution for all problems for above #1 and #2.

Solving above #2 will cover large part of deployments that users are doing.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]