virtio-comment message

Subject: Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration

From: Jason Wang <jasowang@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Date: Tue, 24 Oct 2023 12:56:57 +0800

On Mon, Oct 23, 2023 at 12:43âPM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, October 23, 2023 9:15 AM
> >
> > On Wed, Oct 18, 2023 at 6:23âPM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, October 18, 2023 3:26 PM
> > >
> > > > For completeness, and to shorten the thread, can you please list
> > > > known issues/use cases that are addressed by the status bit
> > > > interface and how you plan for them to be addressed?
> > >
> > > I will avoid listing known issues for a moment for status bit in this email.
> > >
> > > Status bit interface helps in following good ways.
> > > 1. suspend/resume the device fully by the guest by negotiating the new
> > feature.
> > > This can be useful in the guest-controlled PM flows of suspend/resume.
> > > I still think for this, only feature bit is necessary, and device_status
> > modification is not needed.
> >
> > Which feature bit did you mean here?
> >
> A new feature bit to indicate the guest that device supports suspend and resume, hence, there is no need to reset the device and destroy resources like how it is done today.

Well, I don't see how it is different from what LingShan proposed.

>
> > > D0->D3 and D3->D0 transition of the pci can suspend and resume the device
> > which can preserve the last device_status value before entering D3.
> >
> > It's not only about the device status. I would not repeat the question I've asked
> > in another thread.
> >
> > What's more, if you really want to suspend/freeze at PCI level and deal with PCI
> > specific issues like P2P.  You should really try to leverage or invent a PCI
> > mechanism instead of trying to carry such semantics via a virtio specific stuff
> > like adminq. Solving transport specific problems at the virtio level is a layer
> > violation.
> >
> PCI spec has already defined what it needs to.

If PCI spec has good support for suspend/resume, why bother inventing
mechanisms in virtio?

> SR-PCIM interface is already concluded being outside of PCI-spec by the pci-sig.
> And no, there is no layer violation.
>
> Any non PCI member device can always implement necessary STOP mode as no-op.
>
> And all of those talk make sense when one creates MMIO based member device, until that point is just objections...

They are different layers:

1) suspend/resume at virtio level
2) suspend/resume at transport level

We need both of them to satisfy different cases. Just as we need to
reset at both virtio and VF(FLR). Lingshan proposes 1) while it looks
to me you propose 2) via virtio adminq but you said it has been
supported by PCI which is then a duplication.

>
> > > (Like preserving all rest of the fields of common and other device config).
> > > This is orthogonal and needed regardless of device migration.
> > >
> > > 2. If one does not want to passthrough a member device, but build a
> > > mediation-based device on top of existing virtio device, It can be useful with
> > mediating software.
> > > Here the mediating software has ample duplicated knowledge of what the
> > member device already has.
> >
> > It is the way the hypervisors are doing for not only virtio but also for CPU and
> > MMU as well.
> >
> Not really, vcpus and VMCS and more are part of the hardware support.

That's not the context here. Hypervisors need to know almost every
detail to make CPU virtualization work. That's the fact, and it works
for virio as well for years.

What's more, nothing prevents us from inventing something similar in
virtio to speed up the context switch or migration if necessary.

> 2 level nested page tables is hw support.
> Anything beyond 2 level nesting, likely involves hypervisor.

Needs emulation/trap for sure. That's the point.

>
> > > This can fulfil the nested requirement differently provided a platform support
> > it.
> > > (PASID limitation will be practical blocker here).
> >
> > I don't think PASID is a blocker. It is only a blocker if you want to do passthrough.
> >
> Even without passthrough, one needs to steer the hypervisor DMA to non guest memory.
> And guest driver must not be able to attack (read/write) from that memory.
> I donât see how one can do this without PASID. As all DMAs are tagged using only RID.

There are a lot of other ways, but in order to converge, we can leave
it for future discussions.

What's more, if we design virtio for the future, PASID must be
considered as a way as we all know it would come for sure.

>
> > >
> > > How to I plan to address above two?
> > > a. #1 to be addressed by having the _F_PM bit, when the bit is negotiated PCI
> > PM drives the state.
> >
> > We can't duplicate every transport specific feature in virtio. This is a layer
> > violation again. We should reuse the PCI facility here.
> >
> It is reused by having the feature bit to indicate that device supports suspend/resume.
> If from Day_1, if the PCI PM bits used, it would not require the feature bit.
> But that was not the case.
> So the guest driver do not know if using the PCI PM bit is enough to decide, if suspend/resume by guest will work or not.
> Hence the feature bit.

Anyhow you need to update the driver if it has an issue. In the
update, you can check and use PCI PM. If it doesn't have PCI PM, you
can only suspend/resume at virtio level. Defining transport semantics
at the virtio level breaks the layers.

>
> > > This will work orthogonal to VMM side migration and will co-exist with VMM
> > based device migration.

Actually not, if PF can suspend VF via PCI facilities, that would be
no layer violation any more.

> > >
> > > b. nested use case:
> > > L0 VMM maps a VF to L1 guest as PF with emulated SR-IOV capability.
> > > L1 guest to enable SR-IOV and mapping the VF to L2 guest.
> >
> > Let me ask it again here, how can you migrate L2 using L1 "emulated"
> > PF? Emulation?
> >
> Emulation is one way as most nested platform components do.

That's the point, you can't avoid emulation.

Thanks


> May be L1 VF which is = VF + SR-IOV capability is = emulated PF. This PF can run exact same commands as L0 level PF.

Follow-Ups:
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>

References:
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Jason Wang <jasowang@redhat.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>