OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration


On Fri, Nov 17, 2023 at 8:03âPM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Friday, November 17, 2023 5:13 PM
> >
> > On Fri, Nov 17, 2023 at 11:20:14AM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Friday, November 17, 2023 4:41 PM
> > > >
> > > > On Fri, Nov 17, 2023 at 10:20:45AM +0000, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Friday, November 17, 2023 3:38 PM
> > > > > >
> > > > > > On Wed, Nov 15, 2023 at 05:39:43PM +0000, Parav Pandit wrote:
> > > > > > > > >
> > > > > > > > > Additionally, if hypervisor has put the trap on virtio
> > > > > > > > > config, and because the memory device already has the
> > > > > > > > > interface for virtio config,
> > > > > > > > >
> > > > > > > > > Hypervisor can directly write/read from the virtual config
> > > > > > > > > to the member's
> > > > > > > > config space, without going through the device context, right?
> > > > > > > >
> > > > > > > > If it can do it or it can choose to not. I don't see how it
> > > > > > > > is related to the discussion here.
> > > > > > > >
> > > > > > > It is. I donât see a point of hypervisor not using the native
> > > > > > > interface provided
> > > > > > by the member device.
> > > > > >
> > > > > > So for example, it seems reasonable to a member supporting both
> > > > > > existing pci register interface for compatibility and the future
> > > > > > DMA based one for scale. In such a case, it seems possible that
> > > > > > DMA will expose more features than pci. And then a hypervisor
> > > > > > might decide to use
> > > > that in preference to pci registers.
> > > > >
> > > > > We donât find it right to involve owner device for mediating at
> > > > > current scale
> > > >
> > > > In this model, device will be its own owner. Should not be a problem.
> > > >
> > > I didnât understand above comment.
> >
> > We'd add a new group type "self". You can then send admin commands through
> > VF itself not through PF.
> >
> How? The device is owned by the guest. FLR and device reset cannot send the admin command reliably.
>
> >
> > > > > and to not break TDISP efforts in upcoming time by such design.
> > > >
> > > > Look you either stop mentioning TDISP as motivation or actually try
> > > > to address it. Safe migration with TDISP is really hard.
> > > But that is not an excuse to say that TDISP migration is not present, hence
> > involve the owner device for config space access.
> > > This is another hurdle added that further blocks us away from TDISP.
> > > Hence, we donât want to take the route of involving owner device for any
> > config access.
> >
> > This "blocks" is all just wild hunches. hypervisor controls some aspects of TDISP
> > devices for sure - maybe we actually should use pci config space as that is
> > generally hypervisor controlled.
> Even bad to do hypercalls.
> I showed you last time the role of the PCI config space snippet from the spec.
> Do you see we are repeating the discussion again?
>
> >
> > > > For example, your current patches are clearly broken for TDISP:
> > > > owner can control queue state at any time making device modify
> > > > memory in any way it wants.
> > > >
> > > When TDISP migration is needed, the admin device can be another TVM
> > outside the HV scope.
> > > Or an alternative would have device context encrypted not visible to HV at all.
> >
> > Maybe. Fact remains your patches do conflict with TDISP and you seem to be
> > fine with it because you have a hunch you can fix it. But we can't do
> > development based on your hunches.
> >
> We have different view.
> My patches do not conflict with TDISP because TDISP has clear definition of not involving hypervisor for transport.
> And that part is still preserved.
> Delegating the migration to another TDISP or encrypting is yet to be defined.
> And current patches will align to both the approaches in future.
>
> So you need to re-evaluate your judgment.
>
> >
> > > Such encryption is not possible, with the trap+emulation method, where HV
> > will have to decrypt the data coming over MMIO writes.
> >
> > I don't how what trap+emulation has to do with it. Do you refer to the shadow
> > vq thing?
>
> The method proposed here does not hinder any TDISP direction.
>
> Without my proposal, do you have a method that does not involve hypervisor intervention for virtio common and device config space, cvq and shadow vq?
> If so, I would like to hear that as well because that will align with TDISP.

So this is what you said:

1) TDISP would not do mediation
2) registers doesn't scale

This is exactly what transport virtqueue did. Isn't it?

>
> > I am guessing modern platforms with TDISP support are likely to also
> > support dirty bit in the IOMMU.
> >
> It will be some day.

Dirty bit is far more realistic than TDISP in the short term.

>
> >
> > > > > And for future scale, having new SIOV interface makes more sense
> > > > > which has
> > > > its own direct interface to device.
> > > > >
> > > > > I finally captured all past discussions in form of a FAQ at [1].
> > > > >
> > > > > [1]
> > > > > https://docs.google.com/document/d/1Iyn-l3Nm0yls3pZaul4lZiVj8x1s73
> > > > > Ed6r
> > > > > Osmn6LfXc/edit?usp=sharing
> > > >
> > > > Yea skimmed that, "Cons: None". Are you 100% sure? Anyway,
> > > > discussion will take place on the mailing list please.
> > >
> > > We cannot keep discussing the register interface every week.
> > > I remember we have discussed this many times already in following series.
> > >
> > > 1. legacy series

How can this be supported in TDISP then?

> > > 2. tvq v4 series
> > > 3. dynamic vq creation series
> > > 4. again during suspend series under tvq head 5. right now 6. May be
> > > more that I forgot.
> > >
> > > I captured all the direction and options in the doc. One can refer when those
> > questions arise there.
> > > If we donât work cohesively same reasoning repetition does not help.
> >
> > It's still the same too, doc or no doc. You want to build a device without
> > registers fine but don't force it down everyone's throat.
> I donât see any compelling reason for inventing new method really.

New requests/platforms come for sure, and virtio supports various transports.

For example, there's a request to support PCI endpoint devices.

> Nor continuing in register mode.

Most virtio devices are implemented in software. And we have pure MMIO
based transport now which is implemented in registers only.

> Virtio already has VQ.
> If CVQ is so problematic, one should put everything on registers and not run on double standards.

I don't think there's anyone who says CVQ is problematic.

>
> I captured all the reasoning and thoughts. I donât have much to say in support of infinite register scale.
>
> People who wants to push SIOV does not show single performance reason on why SIOV to be done.
> I have upstreamed SIOVs in Linux as SFs without PASID, and in all our scale tests, before the device chocks, the system chocks.
>
> So when someone pushes the SIOV series, I will be the first one interested in reading the performance numbers to proceed with patches.
>
> > And now with 8MBytes
> > of on-device memory that's needed for migration and that's apparently fine I
> > am even less interested in saving 256 bytes for config space.
>
> Again, not the right comparison.
> When and how to use 256 matters.

Do you know how much the config has grown in the past years since 1.0?

Virtio should be implemented easily from:

1) software device to hardware device
2) embedded to server

You can't say e.g migration is needed in all of the environments.

Thanks

> I havenât come across any device that prefers infinite register scale.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]