OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration


On Wed, Nov 22, 2023 at 12:30âAM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, November 21, 2023 10:55 AM
> >
> > On Fri, Nov 17, 2023 at 8:03âPM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Friday, November 17, 2023 5:13 PM
> > > >
> > > > On Fri, Nov 17, 2023 at 11:20:14AM +0000, Parav Pandit wrote:
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Friday, November 17, 2023 4:41 PM
> > > > > >
> > > > > > On Fri, Nov 17, 2023 at 10:20:45AM +0000, Parav Pandit wrote:
> > > > > > >
> > > > > > >
> > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Sent: Friday, November 17, 2023 3:38 PM
> > > > > > > >
> > > > > > > > On Wed, Nov 15, 2023 at 05:39:43PM +0000, Parav Pandit wrote:
> > > > > > > > > > >
> > > > > > > > > > > Additionally, if hypervisor has put the trap on virtio
> > > > > > > > > > > config, and because the memory device already has the
> > > > > > > > > > > interface for virtio config,
> > > > > > > > > > >
> > > > > > > > > > > Hypervisor can directly write/read from the virtual
> > > > > > > > > > > config to the member's
> > > > > > > > > > config space, without going through the device context, right?
> > > > > > > > > >
> > > > > > > > > > If it can do it or it can choose to not. I don't see how
> > > > > > > > > > it is related to the discussion here.
> > > > > > > > > >
> > > > > > > > > It is. I donât see a point of hypervisor not using the
> > > > > > > > > native interface provided
> > > > > > > > by the member device.
> > > > > > > >
> > > > > > > > So for example, it seems reasonable to a member supporting
> > > > > > > > both existing pci register interface for compatibility and
> > > > > > > > the future DMA based one for scale. In such a case, it seems
> > > > > > > > possible that DMA will expose more features than pci. And
> > > > > > > > then a hypervisor might decide to use
> > > > > > that in preference to pci registers.
> > > > > > >
> > > > > > > We donât find it right to involve owner device for mediating
> > > > > > > at current scale
> > > > > >
> > > > > > In this model, device will be its own owner. Should not be a problem.
> > > > > >
> > > > > I didnât understand above comment.
> > > >
> > > > We'd add a new group type "self". You can then send admin commands
> > > > through VF itself not through PF.
> > > >
> > > How? The device is owned by the guest. FLR and device reset cannot send
> > the admin command reliably.
> > >
> > > >
> > > > > > > and to not break TDISP efforts in upcoming time by such design.
> > > > > >
> > > > > > Look you either stop mentioning TDISP as motivation or actually
> > > > > > try to address it. Safe migration with TDISP is really hard.
> > > > > But that is not an excuse to say that TDISP migration is not
> > > > > present, hence
> > > > involve the owner device for config space access.
> > > > > This is another hurdle added that further blocks us away from TDISP.
> > > > > Hence, we donât want to take the route of involving owner device
> > > > > for any
> > > > config access.
> > > >
> > > > This "blocks" is all just wild hunches. hypervisor controls some
> > > > aspects of TDISP devices for sure - maybe we actually should use pci
> > > > config space as that is generally hypervisor controlled.
> > > Even bad to do hypercalls.
> > > I showed you last time the role of the PCI config space snippet from the
> > spec.
> > > Do you see we are repeating the discussion again?
> > >
> > > >
> > > > > > For example, your current patches are clearly broken for TDISP:
> > > > > > owner can control queue state at any time making device modify
> > > > > > memory in any way it wants.
> > > > > >
> > > > > When TDISP migration is needed, the admin device can be another
> > > > > TVM
> > > > outside the HV scope.
> > > > > Or an alternative would have device context encrypted not visible to HV
> > at all.
> > > >
> > > > Maybe. Fact remains your patches do conflict with TDISP and you seem
> > > > to be fine with it because you have a hunch you can fix it. But we
> > > > can't do development based on your hunches.
> > > >
> > > We have different view.
> > > My patches do not conflict with TDISP because TDISP has clear definition of
> > not involving hypervisor for transport.
> > > And that part is still preserved.
> > > Delegating the migration to another TDISP or encrypting is yet to be defined.
> > > And current patches will align to both the approaches in future.
> > >
> > > So you need to re-evaluate your judgment.
> > >
> > > >
> > > > > Such encryption is not possible, with the trap+emulation method,
> > > > > where HV
> > > > will have to decrypt the data coming over MMIO writes.
> > > >
> > > > I don't how what trap+emulation has to do with it. Do you refer to
> > > > the shadow vq thing?
> > >
> > > The method proposed here does not hinder any TDISP direction.
> > >
> > > Without my proposal, do you have a method that does not involve
> > hypervisor intervention for virtio common and device config space, cvq and
> > shadow vq?
> > > If so, I would like to hear that as well because that will align with TDISP.
> >
> > So this is what you said:
> >
> > 1) TDISP would not do mediation
> > 2) registers doesn't scale
> >
> > This is exactly what transport virtqueue did. Isn't it?
> >
> No.
> CVQ is doing both of them currently uniformly across PF, VF.
> Future SIOV will be able to this also.

Please explain how CVQ is related. Or how transport virtqueue blocks
CVQ in any sense. Or any transport virtqueue commands do that.

Then let's continue the discussion here.

>
> > >
> > > > I am guessing modern platforms with TDISP support are likely to also
> > > > support dirty bit in the IOMMU.
> > > >
> > > It will be some day.
> >
> > Dirty bit is far more realistic than TDISP in the short term.
> >
> > >
> > > >
> > > > > > > And for future scale, having new SIOV interface makes more
> > > > > > > sense which has
> > > > > > its own direct interface to device.
> > > > > > >
> > > > > > > I finally captured all past discussions in form of a FAQ at [1].
> > > > > > >
> > > > > > > [1]
> > > > > > > https://docs.google.com/document/d/1Iyn-l3Nm0yls3pZaul4lZiVj8x
> > > > > > > 1s73
> > > > > > > Ed6r
> > > > > > > Osmn6LfXc/edit?usp=sharing
> > > > > >
> > > > > > Yea skimmed that, "Cons: None". Are you 100% sure? Anyway,
> > > > > > discussion will take place on the mailing list please.
> > > > >
> > > > > We cannot keep discussing the register interface every week.
> > > > > I remember we have discussed this many times already in following
> > series.
> > > > >
> > > > > 1. legacy series
> >
> > How can this be supported in TDISP then?

Please answer this question.

> >
> > > > > 2. tvq v4 series
> > > > > 3. dynamic vq creation series
> > > > > 4. again during suspend series under tvq head 5. right now 6. May
> > > > > be more that I forgot.
> > > > >
> > > > > I captured all the direction and options in the doc. One can refer
> > > > > when those
> > > > questions arise there.
> > > > > If we donât work cohesively same reasoning repetition does not help.
> > > >
> > > > It's still the same too, doc or no doc. You want to build a device
> > > > without registers fine but don't force it down everyone's throat.
> > > I donât see any compelling reason for inventing new method really.
> >
> > New requests/platforms come for sure, and virtio supports various transports.
> >
> > For example, there's a request to support PCI endpoint devices.
> What is PCI endpoint devices? Does it have a new device type?

https://docs.kernel.org/PCI/endpoint/index.html

>
> >
> > > Nor continuing in register mode.
> >
> > Most virtio devices are implemented in software.
> We see it differently in field and in virtio charter since 2021.

I think we're talking about different things.

>
> > And we have pure MMIO
> > based transport now which is implemented in registers only.
> >
> > > Virtio already has VQ.
> > > If CVQ is so problematic, one should put everything on registers and not run
> > on double standards.
> >
> > I don't think there's anyone who says CVQ is problematic.
> >
> Ok. than lets stop this endless debate.
> Everyone is using CVQ, lets continue to use.

The CVQ and transport vitqueue is orthogonal. Again, there seems
nobody think transport virtqueue is a replacement of CVQ other than
you.

>
> > >
> > > I captured all the reasoning and thoughts. I donât have much to say in
> > support of infinite register scale.
> > >
> > > People who wants to push SIOV does not show single performance reason
> > on why SIOV to be done.
> > > I have upstreamed SIOVs in Linux as SFs without PASID, and in all our scale
> > tests, before the device chocks, the system chocks.
> > >
> > > So when someone pushes the SIOV series, I will be the first one interested in
> > reading the performance numbers to proceed with patches.
> > >
> > > > And now with 8MBytes
> > > > of on-device memory that's needed for migration and that's
> > > > apparently fine I am even less interested in saving 256 bytes for config
> > space.
> > >
> > > Again, not the right comparison.
> > > When and how to use 256 matters.
> >
> > Do you know how much the config has grown in the past years since 1.0?
> >
> Very less and no point in deviating the design now anyway for device migration or otherwise.
>
> > Virtio should be implemented easily from:
> >
> And it is already there.
>
> > 1) software device to hardware device
> > 2) embedded to server
> >
> > You can't say e.g migration is needed in all of the environments.
> Which line in the patch said this?

You said you don't want to let the register grow. No?

Why do an embedded virtio device need to implement admin virtqueue
just for a new function like suspend or vq reset?

> I specifically asked to not build transport vq because efficiency is needed on the PFs too.

Transport virtqueue can be done in PF.

Thanks


>
> >
> > Thanks
> >
> > > I havenât come across any device that prefers infinite register scale.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]