OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration

On Thu, Oct 19, 2023 at 07:30:09AM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, October 19, 2023 12:05 PM
> > 
> > On Thu, Oct 19, 2023 at 05:31:37AM +0000, Parav Pandit wrote:
> > > > How could we make any agreement without an accurate the definition
> > > > of "passthrough" who is a key to understand each other?
> > >
> > > I replied few times in past emails but since those email threads are so long, it
> > is easy to miss out.
> > >
> > > Passthrough definition:
> > > a. virtio member device mapped to the guest vm b. only pci config
> > > space and msix of a member device is intercepted by hypervisor.
> > > c. virtio config space, virtio cvqs, data vqs of a member device is directly
> > accessed by the guest vm without intercepted by the hypervisor.
> > >
> > > (Why b?, no grand reason, it is how the hypervisors are working where to
> > integrate the virtio member device to).
> > 
> > I think it's a reasonable use-case, though of course not at all the only way to
> > design a system. 
> Sure, there are more ways to bisect the device, specially when underlying device is not a virtio device.
> But one can continue bisecting virtio as well as you listed below.
> > Some more ways:
> > 2- intercept everything except data vqs and cvqs
> > 	I think this is a reasonable way to build the system and has a bunch
> > 	of advantages short term. The main disadvantage as compared to
> > 	passthrough is the need to keep config space coherent with
> > 	device operation - the way to do it is device specific and
> > 	might get fragile.
> > 
> Yes, I agree it has short term advantages.
> This is not future proof as you listed.
> > 4- intercept everything except data vqs
> > 	Here we get another problem in isolating some vqs but not
> >         others. the problem becomes bigger is that you also
> > 	need to communicate control vq to the device.
> > 
> Yes. for non virtio device vendors have easy way to support.
> We supported this for mlx5 devices.
> > also, with both of the above options, we have a question of how are we
> > communicating with the device to keep control path and data path in sync when
> > device's dma is mapped to guest.
> > using PASIDs for isolation might work but again, support is far from universal so
> > we can't really assume it as the only way in the spec.
> > 
> Right.
> > Absent PASID the popular way seems to be shadow vq which basically does
> > 
> > 4- software intercept for everything
> >        clearly that's a lot of CPU overhead, I do not think we can focus on that
> >        as the only way in the spec, though some hypervisors might
> >        already have a lot of migration overhead to the point where
> >        virtio can afford any amount of overhead and it won't be
> >        measureable.
> > 
> > 
> > I also note some or all of the intercepts can always come and go.  For example,
> > a common setup is that if target VCPUs are running then IOMMU will inject
> > interrupts directly into guest - if not you generally trap to hypervisor. Similarly,
> > shadow vq might be active just temporarily.
> > 
> > Which approach is best? I feel ideally virtio would find ways to support them all
> > rather than deciding on a policy in the spec.
> Cooking all the modes seems frankly very daunting to me specially when
> there is no existing software stack to consume all modes and no device
> vendor to sign of for _all_ variations.

Not addressing all the modes.  We are building components not stacks.
Components need to be reusable not stack specific.

Was the whole admin command interface with its levels of indirection
a design mistake then? It was designed exactly to support
all kind of models.
> To me, two stacks are practical and common to target at beginning.
> i.e.
> 1. passthrough mode 
> 2. #2 above,
> I had real technical difficulty to make #2 practically work and build a scalable device and have converged api with #1.
> The option we explored to have admin command in some register of the VF specific for #2 is partially fine targeted for use case #2 only.

Right. So - a way to send admin commands to a VF directly, perhaps in
config space? Do we need more than PA+PASID+some flags?
Want to try to write something like this up?

> A variation of that for the member device, there is owner device, hence admin command on the AQ can be used.
> If we can converge on common virtio interface between #1 and #2, great.
> If we cannot be due to technical issues, we shouldn't step on each other's toes, instead build the two interfaces for two different use cases overcoming its own technical challenges.
> And when in future, someone want to implement different kind of bisections, they can propose the extensions.

Not good at all, this means the interface is very narrow.
Your "propose an extension" just doesn't work practically.
It takes years for things to be widely deployed in the field,
by the time they are there are more use-cases.
We need something universal and admin commands were supposed to be
just this.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]