OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration


On Wed, Nov 22, 2023 at 12:39âPM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, November 22, 2023 9:46 AM
> >
> > On Wed, Nov 22, 2023 at 12:27âAM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, November 21, 2023 10:15 AM
> > > >
> > > > On Fri, Nov 17, 2023 at 11:09âPM Michael S. Tsirkin <mst@redhat.com>
> > > > wrote:
> > > > >
> > > > > On Fri, Nov 17, 2023 at 02:51:04PM +0000, Parav Pandit wrote:
> > > > > >
> > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > Sent: Friday, November 17, 2023 7:24 PM
> > > > > > >
> > > > > > > On Fri, Nov 17, 2023 at 12:46:21PM +0000, Parav Pandit wrote:
> > > > > > > >
> > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > Sent: Friday, November 17, 2023 6:00 PM
> > > > > > > > > To: Parav Pandit <parav@nvidia.com>
> > > > > > > > >
> > > > > > > > > On Fri, Nov 17, 2023 at 12:02:49PM +0000, Parav Pandit wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > > Sent: Friday, November 17, 2023 5:13 PM
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Nov 17, 2023 at 11:20:14AM +0000, Parav Pandit
> > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > > > > Sent: Friday, November 17, 2023 4:41 PM
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Nov 17, 2023 at 10:20:45AM +0000, Parav
> > > > > > > > > > > > > Pandit
> > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > > > > > > Sent: Friday, November 17, 2023 3:38 PM
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Nov 15, 2023 at 05:39:43PM +0000,
> > > > > > > > > > > > > > > Parav Pandit
> > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Additionally, if hypervisor has put the
> > > > > > > > > > > > > > > > > > trap on virtio config, and because the
> > > > > > > > > > > > > > > > > > memory device already has the interface
> > > > > > > > > > > > > > > > > > for virtio config,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hypervisor can directly write/read from
> > > > > > > > > > > > > > > > > > the virtual config to the member's
> > > > > > > > > > > > > > > > > config space, without going through the
> > > > > > > > > > > > > > > > > device context,
> > > > right?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > If it can do it or it can choose to not. I
> > > > > > > > > > > > > > > > > don't see how it is related to the discussion here.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It is. I donât see a point of hypervisor not
> > > > > > > > > > > > > > > > using the native interface provided
> > > > > > > > > > > > > > > by the member device.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So for example, it seems reasonable to a
> > > > > > > > > > > > > > > member supporting both existing pci register
> > > > > > > > > > > > > > > interface for compatibility and the future DMA
> > > > > > > > > > > > > > > based one for scale. In such a case, it seems
> > > > > > > > > > > > > > > possible that DMA will expose more features
> > > > > > > > > > > > > > > than pci. And then a hypervisor might decide
> > > > > > > > > > > > > > > to use
> > > > > > > > > > > > > that in preference to pci registers.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We donât find it right to involve owner device
> > > > > > > > > > > > > > for mediating at current scale
> > > > > > > > > > > > >
> > > > > > > > > > > > > In this model, device will be its own owner.
> > > > > > > > > > > > > Should not be a
> > > > problem.
> > > > > > > > > > > > >
> > > > > > > > > > > > I didnât understand above comment.
> > > > > > > > > > >
> > > > > > > > > > > We'd add a new group type "self". You can then send
> > > > > > > > > > > admin commands through VF itself not through PF.
> > > > > > > > > > >
> > > > > > > > > > How? The device is owned by the guest. FLR and device
> > > > > > > > > > reset cannot send the
> > > > > > > > > admin command reliably.
> > > > > > > > >
> > > > > > > > > It's of the "it hurts when I do this - don't do this then" category.
> > > > > > > > >
> > > > > > > > it is donât do medication category, yes due all this
> > > > > > > > weirdness that has been
> > > > > > > asked.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > > and to not break TDISP efforts in upcoming time
> > > > > > > > > > > > > > by such
> > > > design.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Look you either stop mentioning TDISP as
> > > > > > > > > > > > > motivation or actually try to address it. Safe
> > > > > > > > > > > > > migration with TDISP is really
> > > > hard.
> > > > > > > > > > > > But that is not an excuse to say that TDISP
> > > > > > > > > > > > migration is not present, hence
> > > > > > > > > > > involve the owner device for config space access.
> > > > > > > > > > > > This is another hurdle added that further blocks us
> > > > > > > > > > > > away from
> > > > TDISP.
> > > > > > > > > > > > Hence, we donât want to take the route of involving
> > > > > > > > > > > > owner device for any
> > > > > > > > > > > config access.
> > > > > > > > > > >
> > > > > > > > > > > This "blocks" is all just wild hunches. hypervisor
> > > > > > > > > > > controls some aspects of TDISP devices for sure -
> > > > > > > > > > > maybe we actually should use pci config space as that
> > > > > > > > > > > is generally hypervisor
> > > > controlled.
> > > > > > > > > > Even bad to do hypercalls.
> > > > > > > > > > I showed you last time the role of the PCI config space
> > > > > > > > > > snippet from the
> > > > > > > spec.
> > > > > > > > >
> > > > > > > > > Yes I remember. This is just an example though. My point
> > > > > > > > > is maybe it is solvable maybe it is not.
> > > > > > > > >
> > > > > > > > > > Do you see we are repeating the discussion again?
> > > > > > > > >
> > > > > > > > > One of the reasons is that people bring up irrelevances.
> > > > > > > > > TDISP is important but has to be addressed or deferred not
> > > > > > > > > vaguely referred
> > > > to.
> > > > > > > >
> > > > > > > > So lets continue to follow the current TDISP direction of
> > > > > > > > not involving
> > > > > > > hypervisor for virtio common and device config.
> > > > > >
> > > > > > If you disagree to it, please speak now, so that we donât debate
> > > > > > on this
> > > > again in next 3 days.
> > > > > > Because this is the fundamental design considerations it relied on.
> > > > > > There is no point going forward if you want to disagree to it.
> > > > > > Other variants are fine, but other variants cannot be the only choice.
> > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > For example, your current patches are clearly broken for
> > TDISP:
> > > > > > > > > > > > > owner can control queue state at any time making
> > > > > > > > > > > > > device modify memory in any way it wants.
> > > > > > > > > > > > >
> > > > > > > > > > > > When TDISP migration is needed, the admin device can
> > > > > > > > > > > > be another TVM
> > > > > > > > > > > outside the HV scope.
> > > > > > > > > > > > Or an alternative would have device context
> > > > > > > > > > > > encrypted not visible to HV at
> > > > > > > > > all.
> > > > > > > > > > >
> > > > > > > > > > > Maybe. Fact remains your patches do conflict with
> > > > > > > > > > > TDISP and you seem to be fine with it because you have
> > > > > > > > > > > a hunch you can
> > > > fix it.
> > > > > > > > > > > But we can't do development based on your hunches.
> > > > > > > > > > >
> > > > > > > > > > We have different view.
> > > > > > > > > > My patches do not conflict with TDISP because TDISP has
> > > > > > > > > > clear definition of
> > > > > > > > > not involving hypervisor for transport.
> > > > > > > > > > And that part is still preserved.
> > > > > > > > > > Delegating the migration to another TDISP or encrypting
> > > > > > > > > > is yet to be
> > > > > > > defined.
> > > > > > > > > > And current patches will align to both the approaches in
> > future.
> > > > > > > > > >
> > > > > > > > > > So you need to re-evaluate your judgment.
> > > > > > > > >
> > > > > > > > > If you like they do not "conflict".  But if used with
> > > > > > > > > TDISP they just make it insecure and thus completely
> > > > > > > > > worthless.  If hypervisor can change ring state to make
> > > > > > > > > device poke at random guest memory then it is game over
> > > > > > > > > and all the effort spent was
> > > > security theater.
> > > > > > > > Not really, I proposed two options.
> > > > > > > > 1. delegate the task of LM to the TVM. (proposed by two cpu
> > vendors).
> > > > > > > > In this case all the infra we build here, just works fine.
> > > > > > >
> > > > > > > I think modification will be needed: currently commands are
> > > > > > > sent through the PF, and that is under hypervisor control.
> > > > > > > You should not assign PF to TVM.
> > > >
> > > > That's the point. And that's why it keeps people confused to believe
> > > > the current PF/adminq can work in the TDISP.
> > > >
> > > There is no confusion.
> >
> > No, when LingShan points out the conflict, you just told us it will be
> > addressed in the future.
> >
> Current proposal does not punch the hole in the TDISP TVM interface.

So you meant adminq in PF works in TDISP?

>
> > And after Michael pointed it out again, you agree than adminq can not be
> > part of PF in this context.
> >
> As explained when TDISP for device migration evolves, there will be other trusted entity which will do it.

Platform has been evolved to support dirty pages. Let's use that.

>
>
> > And you miss the fact that admin virtqueue today can't be used to manage
> > the owner.
> >
> There is no such need for device migration.

You just agree that in order to migrate to TDISP, admin can't sit in
PF, so where is it?

>
> > > The admin queue interface ensures first step that TDISP interface is
> > dedicated to guest as today.
> > > There is no bifurcation added on the VF that needs extra mediation.
> > >
> > > > > > Yes, an admin virtio function will be there which will do the
> > > > > > admin
> > > > commands listed.
> > > > >
> > > > > So it can't be PF, so at least we need a new group type.
> > > > > I am inclined to then say, operate it through VF itself.
> > > >
> > > > So it exactly matches the idea of transport virtqueue (a per VF/SF one).
> > > >
> > > There is no need for transport virtqueue for VF as VF device has same
> > uniform principle as PF.
> >
> > I don't understand here. I have explained that you have invented a function
> > duplication of transport virtqueue.
> >
> No. we havenât.
> It is explained in other thread that functionality is different.
>
> > > If you want transport vq, please have it on the PF too.
> >
> > Nothing prevents this, actually transport virtqueue start from this.
> >
> > > And that also is not needed because there is already CVQ.
> >
> > I don't see why you keep mentioning CVQ. I don't see anyone that says
> > transport virtqueue is going to replace CVQ.
> >
> I explained everything in the doc. This is the Nth time being repeated...

The doc is wrong since it assumes that people worry that transport
virtqueue will replace cvq.

>
> > >
> > > > But it still requires a PCI part to bootstrap.
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > It also does not require any hypervisor mediation for control
> > plane.
> > > > > > > >
> > > > > > > > 2. Encrypt the owner device workload to be not seen by
> > > > > > > > hypervisor
> > > > > > > >
> > > > > > > > Both methods does not affect the current direction.
> > > > > > > >
> > > > > > > > But if we force trap+emulation, it is 100% broken for TDISP.
> > > > > > > > And I would not promote that.
> > > > > > > >
> > > > > > > > > But you know this, don't you? This is why you mentioned
> > > > > > > > > encrypting
> > > > device.
> > > > > > > > > Maybe that works. It just does not work *as is*.
> > > > > > > > It works as_is. But current infrastructure does not block
> > > > > > > > the future
> > > > work.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Such encryption is not possible, with the
> > > > > > > > > > > > trap+emulation method, where HV
> > > > > > > > > > > will have to decrypt the data coming over MMIO writes.
> > > > > > > > > > >
> > > > > > > > > > > I don't how what trap+emulation has to do with it. Do
> > > > > > > > > > > you refer to the shadow vq thing?
> > > > > > > > > >
> > > > > > > > > > The method proposed here does not hinder any TDISP
> > direction.
> > > > > > > > >
> > > > > > > > > direction? No, why would it. we can always add more
> > > > > > > > > commands that are safe for TDISP. commands you propose
> > > > > > > > > here are unsafe for
> > > > TDISP.
> > > > > > > > >
> > > > > > > > > > Without my proposal, do you have a method that does not
> > > > > > > > > > involve hypervisor
> > > > > > > > > intervention for virtio common and device config space,
> > > > > > > > > cvq and shadow
> > > > > > > vq?
> > > > > > > > > > If so, I would like to hear that as well because that
> > > > > > > > > > will align with
> > > > TDISP.
> > > > > > > > >
> > > > > > > > > I really did not give it much thought.  I suspect for
> > > > > > > > > TDISP it just might be cleaner to have guest agent migrate
> > device.
> > > > > > > > > Certainly removes all
> > > > > > > the messy questions.
> > > > > > > > > That, to me impliest there needs to be a way to send
> > > > > > > > > migration commands through VF itself. Does this "involve
> > > > > > > > > hypervisor intervention"? No one should care I think.
> > > > > > > > Too far of the future to envision. May be yes. When such
> > > > > > > > platform is built, for sure whoever migrates need migrate
> > > > > > > > its device side
> > > > too.
> > > > > > > > Some knowledge of migration driver is needed.
> > > > > > >
> > > > > > > So TDISP migration is so far in the future you do not need to
> > > > > > > bother
> > > > about it.
> > > > > > > Fine. Then don't bring it up pls.
> > > > > > >
> > > > > > As long as we are aligned to the requirement that a virtio
> > > > > > member device is
> > > > mapped to the guest VM without mediating the virtio interface, I am
> > good.
> > > > > > Again, other variants are fine, but above listed mapped variant
> > > > > > is the
> > > > minimum variant needed.
> > > > >
> > > > > I think it's worth supporting this. I wouldn't call this minimum
> > > > > there are other approaches.  And I am not so sure it's worth
> > > > > trying to support this in all kind of systems such as IOMMU
> > > > > without dirty bit support. If some old systems will need
> > > > > mediation, this is kind of like legacy interface. Not a big deal.
> > > > >
> > > >
> > > > +1
> > >
> > > There are users with the recent cpus that may not have the IOMMU dirty
> > page tracking support.
> > > So I donât fully agree.
> >
> > There are setups that don't have SR-IOV or even PCI.
> >
> Which are those? Please expand those transports that need it.

For transport/platform without SR-IOV, you want to expand the
transport no matter whether or not it is possible.
For dirty page tracking, you want to do it in virtio no matter how
long it has been implemented by various platforms.

If you are correct, I can say, please wait for the platform support
for dirty page tracking.

Thanks





>
> > Let's have a unified standard.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]