virtio-comment message

Subject: RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration

From: Parav Pandit <parav@nvidia.com>
To: Jason Wang <jasowang@redhat.com>
Date: Wed, 22 Nov 2023 04:39:48 +0000

> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, November 22, 2023 9:46 AM
> 
> On Wed, Nov 22, 2023 at 12:27âAM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, November 21, 2023 10:15 AM
> > >
> > > On Fri, Nov 17, 2023 at 11:09âPM Michael S. Tsirkin <mst@redhat.com>
> > > wrote:
> > > >
> > > > On Fri, Nov 17, 2023 at 02:51:04PM +0000, Parav Pandit wrote:
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Friday, November 17, 2023 7:24 PM
> > > > > >
> > > > > > On Fri, Nov 17, 2023 at 12:46:21PM +0000, Parav Pandit wrote:
> > > > > > >
> > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Sent: Friday, November 17, 2023 6:00 PM
> > > > > > > > To: Parav Pandit <parav@nvidia.com>
> > > > > > > >
> > > > > > > > On Fri, Nov 17, 2023 at 12:02:49PM +0000, Parav Pandit wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > Sent: Friday, November 17, 2023 5:13 PM
> > > > > > > > > >
> > > > > > > > > > On Fri, Nov 17, 2023 at 11:20:14AM +0000, Parav Pandit
> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > > > Sent: Friday, November 17, 2023 4:41 PM
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Nov 17, 2023 at 10:20:45AM +0000, Parav
> > > > > > > > > > > > Pandit
> > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > > > > > Sent: Friday, November 17, 2023 3:38 PM
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Nov 15, 2023 at 05:39:43PM +0000,
> > > > > > > > > > > > > > Parav Pandit
> > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Additionally, if hypervisor has put the
> > > > > > > > > > > > > > > > > trap on virtio config, and because the
> > > > > > > > > > > > > > > > > memory device already has the interface
> > > > > > > > > > > > > > > > > for virtio config,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hypervisor can directly write/read from
> > > > > > > > > > > > > > > > > the virtual config to the member's
> > > > > > > > > > > > > > > > config space, without going through the
> > > > > > > > > > > > > > > > device context,
> > > right?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If it can do it or it can choose to not. I
> > > > > > > > > > > > > > > > don't see how it is related to the discussion here.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It is. I donât see a point of hypervisor not
> > > > > > > > > > > > > > > using the native interface provided
> > > > > > > > > > > > > > by the member device.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So for example, it seems reasonable to a
> > > > > > > > > > > > > > member supporting both existing pci register
> > > > > > > > > > > > > > interface for compatibility and the future DMA
> > > > > > > > > > > > > > based one for scale. In such a case, it seems
> > > > > > > > > > > > > > possible that DMA will expose more features
> > > > > > > > > > > > > > than pci. And then a hypervisor might decide
> > > > > > > > > > > > > > to use
> > > > > > > > > > > > that in preference to pci registers.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We donât find it right to involve owner device
> > > > > > > > > > > > > for mediating at current scale
> > > > > > > > > > > >
> > > > > > > > > > > > In this model, device will be its own owner.
> > > > > > > > > > > > Should not be a
> > > problem.
> > > > > > > > > > > >
> > > > > > > > > > > I didnât understand above comment.
> > > > > > > > > >
> > > > > > > > > > We'd add a new group type "self". You can then send
> > > > > > > > > > admin commands through VF itself not through PF.
> > > > > > > > > >
> > > > > > > > > How? The device is owned by the guest. FLR and device
> > > > > > > > > reset cannot send the
> > > > > > > > admin command reliably.
> > > > > > > >
> > > > > > > > It's of the "it hurts when I do this - don't do this then" category.
> > > > > > > >
> > > > > > > it is donât do medication category, yes due all this
> > > > > > > weirdness that has been
> > > > > > asked.
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > > and to not break TDISP efforts in upcoming time
> > > > > > > > > > > > > by such
> > > design.
> > > > > > > > > > > >
> > > > > > > > > > > > Look you either stop mentioning TDISP as
> > > > > > > > > > > > motivation or actually try to address it. Safe
> > > > > > > > > > > > migration with TDISP is really
> > > hard.
> > > > > > > > > > > But that is not an excuse to say that TDISP
> > > > > > > > > > > migration is not present, hence
> > > > > > > > > > involve the owner device for config space access.
> > > > > > > > > > > This is another hurdle added that further blocks us
> > > > > > > > > > > away from
> > > TDISP.
> > > > > > > > > > > Hence, we donât want to take the route of involving
> > > > > > > > > > > owner device for any
> > > > > > > > > > config access.
> > > > > > > > > >
> > > > > > > > > > This "blocks" is all just wild hunches. hypervisor
> > > > > > > > > > controls some aspects of TDISP devices for sure -
> > > > > > > > > > maybe we actually should use pci config space as that
> > > > > > > > > > is generally hypervisor
> > > controlled.
> > > > > > > > > Even bad to do hypercalls.
> > > > > > > > > I showed you last time the role of the PCI config space
> > > > > > > > > snippet from the
> > > > > > spec.
> > > > > > > >
> > > > > > > > Yes I remember. This is just an example though. My point
> > > > > > > > is maybe it is solvable maybe it is not.
> > > > > > > >
> > > > > > > > > Do you see we are repeating the discussion again?
> > > > > > > >
> > > > > > > > One of the reasons is that people bring up irrelevances.
> > > > > > > > TDISP is important but has to be addressed or deferred not
> > > > > > > > vaguely referred
> > > to.
> > > > > > >
> > > > > > > So lets continue to follow the current TDISP direction of
> > > > > > > not involving
> > > > > > hypervisor for virtio common and device config.
> > > > >
> > > > > If you disagree to it, please speak now, so that we donât debate
> > > > > on this
> > > again in next 3 days.
> > > > > Because this is the fundamental design considerations it relied on.
> > > > > There is no point going forward if you want to disagree to it.
> > > > > Other variants are fine, but other variants cannot be the only choice.
> > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > For example, your current patches are clearly broken for
> TDISP:
> > > > > > > > > > > > owner can control queue state at any time making
> > > > > > > > > > > > device modify memory in any way it wants.
> > > > > > > > > > > >
> > > > > > > > > > > When TDISP migration is needed, the admin device can
> > > > > > > > > > > be another TVM
> > > > > > > > > > outside the HV scope.
> > > > > > > > > > > Or an alternative would have device context
> > > > > > > > > > > encrypted not visible to HV at
> > > > > > > > all.
> > > > > > > > > >
> > > > > > > > > > Maybe. Fact remains your patches do conflict with
> > > > > > > > > > TDISP and you seem to be fine with it because you have
> > > > > > > > > > a hunch you can
> > > fix it.
> > > > > > > > > > But we can't do development based on your hunches.
> > > > > > > > > >
> > > > > > > > > We have different view.
> > > > > > > > > My patches do not conflict with TDISP because TDISP has
> > > > > > > > > clear definition of
> > > > > > > > not involving hypervisor for transport.
> > > > > > > > > And that part is still preserved.
> > > > > > > > > Delegating the migration to another TDISP or encrypting
> > > > > > > > > is yet to be
> > > > > > defined.
> > > > > > > > > And current patches will align to both the approaches in
> future.
> > > > > > > > >
> > > > > > > > > So you need to re-evaluate your judgment.
> > > > > > > >
> > > > > > > > If you like they do not "conflict".  But if used with
> > > > > > > > TDISP they just make it insecure and thus completely
> > > > > > > > worthless.  If hypervisor can change ring state to make
> > > > > > > > device poke at random guest memory then it is game over
> > > > > > > > and all the effort spent was
> > > security theater.
> > > > > > > Not really, I proposed two options.
> > > > > > > 1. delegate the task of LM to the TVM. (proposed by two cpu
> vendors).
> > > > > > > In this case all the infra we build here, just works fine.
> > > > > >
> > > > > > I think modification will be needed: currently commands are
> > > > > > sent through the PF, and that is under hypervisor control.
> > > > > > You should not assign PF to TVM.
> > >
> > > That's the point. And that's why it keeps people confused to believe
> > > the current PF/adminq can work in the TDISP.
> > >
> > There is no confusion.
> 
> No, when LingShan points out the conflict, you just told us it will be
> addressed in the future.
>
Current proposal does not punch the hole in the TDISP TVM interface.
 
> And after Michael pointed it out again, you agree than adminq can not be
> part of PF in this context.
>
As explained when TDISP for device migration evolves, there will be other trusted entity which will do it.

 
> And you miss the fact that admin virtqueue today can't be used to manage
> the owner.
>
There is no such need for device migration.
 
> > The admin queue interface ensures first step that TDISP interface is
> dedicated to guest as today.
> > There is no bifurcation added on the VF that needs extra mediation.
> >
> > > > > Yes, an admin virtio function will be there which will do the
> > > > > admin
> > > commands listed.
> > > >
> > > > So it can't be PF, so at least we need a new group type.
> > > > I am inclined to then say, operate it through VF itself.
> > >
> > > So it exactly matches the idea of transport virtqueue (a per VF/SF one).
> > >
> > There is no need for transport virtqueue for VF as VF device has same
> uniform principle as PF.
> 
> I don't understand here. I have explained that you have invented a function
> duplication of transport virtqueue.
> 
No. we havenât.
It is explained in other thread that functionality is different.

> > If you want transport vq, please have it on the PF too.
> 
> Nothing prevents this, actually transport virtqueue start from this.
> 
> > And that also is not needed because there is already CVQ.
> 
> I don't see why you keep mentioning CVQ. I don't see anyone that says
> transport virtqueue is going to replace CVQ.
>
I explained everything in the doc. This is the Nth time being repeated...
 
> >
> > > But it still requires a PCI part to bootstrap.
> > >
> > > >
> > > >
> > > > > >
> > > > > > > It also does not require any hypervisor mediation for control
> plane.
> > > > > > >
> > > > > > > 2. Encrypt the owner device workload to be not seen by
> > > > > > > hypervisor
> > > > > > >
> > > > > > > Both methods does not affect the current direction.
> > > > > > >
> > > > > > > But if we force trap+emulation, it is 100% broken for TDISP.
> > > > > > > And I would not promote that.
> > > > > > >
> > > > > > > > But you know this, don't you? This is why you mentioned
> > > > > > > > encrypting
> > > device.
> > > > > > > > Maybe that works. It just does not work *as is*.
> > > > > > > It works as_is. But current infrastructure does not block
> > > > > > > the future
> > > work.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Such encryption is not possible, with the
> > > > > > > > > > > trap+emulation method, where HV
> > > > > > > > > > will have to decrypt the data coming over MMIO writes.
> > > > > > > > > >
> > > > > > > > > > I don't how what trap+emulation has to do with it. Do
> > > > > > > > > > you refer to the shadow vq thing?
> > > > > > > > >
> > > > > > > > > The method proposed here does not hinder any TDISP
> direction.
> > > > > > > >
> > > > > > > > direction? No, why would it. we can always add more
> > > > > > > > commands that are safe for TDISP. commands you propose
> > > > > > > > here are unsafe for
> > > TDISP.
> > > > > > > >
> > > > > > > > > Without my proposal, do you have a method that does not
> > > > > > > > > involve hypervisor
> > > > > > > > intervention for virtio common and device config space,
> > > > > > > > cvq and shadow
> > > > > > vq?
> > > > > > > > > If so, I would like to hear that as well because that
> > > > > > > > > will align with
> > > TDISP.
> > > > > > > >
> > > > > > > > I really did not give it much thought.  I suspect for
> > > > > > > > TDISP it just might be cleaner to have guest agent migrate
> device.
> > > > > > > > Certainly removes all
> > > > > > the messy questions.
> > > > > > > > That, to me impliest there needs to be a way to send
> > > > > > > > migration commands through VF itself. Does this "involve
> > > > > > > > hypervisor intervention"? No one should care I think.
> > > > > > > Too far of the future to envision. May be yes. When such
> > > > > > > platform is built, for sure whoever migrates need migrate
> > > > > > > its device side
> > > too.
> > > > > > > Some knowledge of migration driver is needed.
> > > > > >
> > > > > > So TDISP migration is so far in the future you do not need to
> > > > > > bother
> > > about it.
> > > > > > Fine. Then don't bring it up pls.
> > > > > >
> > > > > As long as we are aligned to the requirement that a virtio
> > > > > member device is
> > > mapped to the guest VM without mediating the virtio interface, I am
> good.
> > > > > Again, other variants are fine, but above listed mapped variant
> > > > > is the
> > > minimum variant needed.
> > > >
> > > > I think it's worth supporting this. I wouldn't call this minimum
> > > > there are other approaches.  And I am not so sure it's worth
> > > > trying to support this in all kind of systems such as IOMMU
> > > > without dirty bit support. If some old systems will need
> > > > mediation, this is kind of like legacy interface. Not a big deal.
> > > >
> > >
> > > +1
> >
> > There are users with the recent cpus that may not have the IOMMU dirty
> page tracking support.
> > So I donât fully agree.
> 
> There are setups that don't have SR-IOV or even PCI.
> 
Which are those? Please expand those transports that need it.

> Let's have a unified standard.

Follow-Ups:
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Jason Wang <jasowang@redhat.com>

References:
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
  - From: Jason Wang <jasowang@redhat.com>