OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function


On Tue, Aug 24, 2021 at 9:10 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Aug 24, 2021 at 10:41:54AM +0800, Jason Wang wrote:
>
> > > migration exposed to the guest ? No.
> >
> > Can you explain why?
>
> For the SRIOV case migration is a privileged operation of the
> hypervisor. The guest must not be allowed to interact with it in any
> way otherwise the hypervisor migration could be attacked from the
> guest and this has definite security implications.
>
> In practice this means that nothing related to migration can be
> located on the MMIO pages/queues/etc of the VF. The reasons for this
> are a bit complicated and has to do with the limitations of IO
> isolation with VFIO - eg you can't reliably split a single PCI BDF
> into hypervisor/guest security domains without PASID.

So exposing the migration function can be done indirectly:

In L0, the hardware implements the function via PF, Qemu will present
an emulated PCI device then Qemu can expose those functions via a
capability for L1 guests. When L1 driver tries to use those functions,
it goes:

L1 virtio-net driver -(emulated PCI-E BAR)-> Qemu -(ioctl)-> L0 kernel
VF driver -> L0 kernel PF driver -(virtio interface)-> virtio PF

In this approach, there's no way for the L1 driver to control the or
see what is implemented in the hardware (PF). The details were hidden
by Qemu. This works even if DMA is required for the L0 kernel PF
driver to talk with the hardware since for L1 we didn't present a DMA
interface. With the future PASID support, we can even present a DMA
interface to L1.

>
> We recently revisited this concept again with a HNS vfio driver. IIRC
> Intel messed it up in their mdev driver too.
>
> > > >>> Let's open another thread for this if you wish, it has nothing related
> > > >>> to the spec but how it is implemented in Linux. If you search the
> > > >>> archive, something similar to "vfio_virtio_pci" has been proposed
> > > >>> several years before by Intel. The idea has been rejected, and we have
> > > >>> leveraged Linux vDPA bus for virtio-pci devices.
>
> That was largely because Intel was proposing to use mdevs to create an
> entire VDPA subsystem hidden inside VFIO.
>
> We've invested in a pure VFIO solution which should be merged soon:
>
> https://lore.kernel.org/kvm/20210819161914.7ad2e80e.alex.williamson@redhat.com/
>
> It does not rely on mdevs. It is not trying to recreate VDPA. Instead
> the HW provides a fully functional virto VF and the solution uses
> normal SRIOV approaches.
>
> You can contrast this with the two virtio-net solutions mlx5 will
> support:
>
> - One is the existing hypervisor assisted VDPA solution where the mlx5
>   driver does HW accelerated queue processing.
>
> - The other one is a full PCI VF that provides a virtio-net function
>   without any hypervisor assistance. In this case we will have a VFIO
>   migration driver as above that to provide SRIOV VF live migration.

This part I understand.

>
> I see in this thread that these two things are becoming quite
> confused. They are very different, have different security postures
> and use different parts of the hypervisor stack, and intended for
> quite different use cases.

It looks like the full PCI VF could go via the virtio-pci vDPA driver
as well (drivers/vdpa/virtio-pci). So what's the advantages of
exposing the migration of virtio via vfio instead of vhost-vDPA? With
the vhost, we can have a lot of benefits:

1) migration compatibility with the existing software virtio and
vhost/vDPA implementations
2) presenting a virtio device instead of a virtio-pci device, this
makes it possible to be used by the case that guest doesn't need a PCI
at all  (firecracker or micro vm)
3) management infrastructure is almost ready (what Parav did)

>
> > Your proposal works only for PCI with SR-IOV. And I want to leverage
> > it to be useful for other platforms or transport. That's all my
> > motivation.
>
> I've read most of the emails here I still don't see what the use case
> is for this beyond PCI SRIOV.

So we have transports other than PCI. The basic functions for
migration is common:

- device freeze/stop
- device states
- dirty page tracking (not a must)

>
> In a general sense it requires virtio to specify how PASID works. No
> matter what we must create a split secure/guest world where DMAs from
> each world are uniquely tagged. In the pure PCI world this means
> either using PF/VF or VF/PASID.
>
> In general PASID still has a long road to go before it is working in
> Linux:
>
> https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
>

Yes, I think we have agreed that it is something we want and vDPA will
support that for sure.

> So, IMHO, it make sense to focus on the PF/VF definition for spec
> purposes.

That's fine.

>
> I agree it would be good spec design to have a general concept of a
> secure and guest world and specific sections that defines how it works
> for different scenarios, but that seems like a language remark and not
> one about the design. For instance the admin queue Max is adding is
> clearly part of the secure world and putting it on the PF is the only
> option for the SRIOV mode.

Yes, but let's move common functionality that is required for all
transports to the chapter of "basic device facility". We don't need to
define how it works in other different scenarios now.

Thanks

>
> Jason
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]