virtio-comment message

Subject: Re: [virtio-comment] Live Migration of Virtio Virtual Function
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Max Gurtovoy <mgurtovoy@nvidia.com>
Date: Mon, 23 Aug 2021 13:18:31 +0100
* Max Gurtovoy (mgurtovoy@nvidia.com) wrote:
> 
> On 8/19/2021 5:24 PM, Dr. David Alan Gilbert wrote:
> > * Max Gurtovoy (mgurtovoy@nvidia.com) wrote:
> > > On 8/19/2021 2:12 PM, Dr. David Alan Gilbert wrote:
> > > > * Max Gurtovoy (mgurtovoy@nvidia.com) wrote:
> > > > > On 8/18/2021 1:46 PM, Jason Wang wrote:
> > > > > > On Wed, Aug 18, 2021 at 5:16 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
> > > > > > > On 8/17/2021 12:44 PM, Jason Wang wrote:
> > > > > > > > On Tue, Aug 17, 2021 at 5:11 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
> > > > > > > > > On 8/17/2021 11:51 AM, Jason Wang wrote:
> > > > > > > > > > å 2021/8/12 äå8:08, Max Gurtovoy åé:
> > > > > > > > > > > Hi all,
> > > > > > > > > > > 
> > > > > > > > > > > Live migration is one of the most important features of
> > > > > > > > > > > virtualization and virtio devices are oftenly found in virtual
> > > > > > > > > > > environments.
> > > > > > > > > > > 
> > > > > > > > > > > The migration process is managed by a migration SW that is running on
> > > > > > > > > > > the hypervisor and the VM is not aware of the process at all.
> > > > > > > > > > > 
> > > > > > > > > > > Unlike the vDPA case, a real pci Virtual Function state resides in
> > > > > > > > > > > the HW.
> > > > > > > > > > > 
> > > > > > > > > > vDPA doesn't prevent you from having HW states. Actually from the view
> > > > > > > > > > of the VMM(Qemu), it doesn't care whether or not a state is stored in
> > > > > > > > > > the software or hardware. A well designed VMM should be able to hide
> > > > > > > > > > the virtio device implementation from the migration layer, that is how
> > > > > > > > > > Qemu is wrote who doesn't care about whether or not it's a software
> > > > > > > > > > virtio/vDPA device or not.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > In our vision, in order to fulfil the Live migration requirements for
> > > > > > > > > > > virtual functions, each physical function device must implement
> > > > > > > > > > > migration operations. Using these operations, it will be able to
> > > > > > > > > > > master the migration process for the virtual function devices. Each
> > > > > > > > > > > capable physical function device has a supervisor permissions to
> > > > > > > > > > > change the virtual function operational states, save/restore its
> > > > > > > > > > > internal state and start/stop dirty pages tracking.
> > > > > > > > > > > 
> > > > > > > > > > For "supervisor permissions", is this from the software point of view?
> > > > > > > > > > Maybe it's better to give an example for this.
> > > > > > > > > A permission to a PF device for quiesce and freeze a VF device for example.
> > > > > > > > Note that for safety, VMM (e.g Qemu) is usually running without any privileges.
> > > > > > > You're mixing layers here.
> > > > > > > 
> > > > > > > QEMU is not involved here. It's only sending IOCTLs to migration driver.
> > > > > > > The migration driver will control the migration process of the VF using
> > > > > > > the PF communication channel.
> > > > > > So who will be granted the "permission" you mentioned here?
> > > > > This is just an expression.
> > > > > 
> > > > > What is not clear ?
> > > > > 
> > > > > The PF device will have an option to quiesce/freeze the VF device.
> > > > > 
> > > > > This is simple. Why are you looking for some sophisticated problems ?
> > > > I'm trying to follow along here and have not completely; but I think the issue is a
> > > > security separation one.
> > > > The VMM (e.g. qemu) that has been given access to one of the VF's is
> > > > isolated and shouldn't be able to go poking at other devices; so it
> > > > can't go poking at the PF (it probably doesn't even have the PF device
> > > > node accessible) - so then the question is who has access to the
> > > > migration driver and how do you make sure it can only deal with VF's
> > > > that it's supposed to be able to migrate.
> > > The QEMU/userspace doesn't know or care about the PF connection and internal
> > > virtio_vfio_pci driver implementation.
> > OK
> > 
> > > You shouldn't change 1 line of code in the VM driver nor in QEMU.
> > Hmm OK.
> > 
> > > QEMU does not have access to the PF. Only the kernel driver that has access
> > > to the VF will have access to the PF communication channel.Â There is no
> > > permission problem here.
> > > 
> > > The kernel driver of the VF will do this internally, and make sure that the
> > > commands it build will only impact the VF originating them.
> > > 
> > Now that confuses me; isn't the kernel driver that has access to the VF
> > running inside the guest?  If it's inside the guest we can't trust it to
> > do anything about stopping impact to other devices.
> 
> No. The driver is in the hypervisor (virtio_vfio_pci). This is the migration
> driver, right ?

Ah OK, the '*host* kernel driver of the VF' - that makes more sense to
me, especially with that just being VFIO.

> The guest is running as usual. It doesn't aware on the migration at all.
> 
> This is the point I try to make here. I don't (and I can't) change even 1
> line of code in the guest.
> 
> e.g:
> 
> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor (bounded
> to VF5) --> send admin command on PF adminq to start tracking dirty pages
> for VF5 --> PF device will do it
> 
> QEMU ioctl --> vfio (hypervisor) --> virtio_vfio_pci on hypervisor (bounded
> to VF5) --> send admin command on PF adminq to quiesce VF5 --> PF device
> will do it

Yeh that makes more sense.

Dave

> You can take a look how we implement mlx5_vfio_pci in the link I provided.
> 
> > 
> > Dave
> > 
> > 
> > > We already do this in mlx5 NIC migration. The kernel is secured and QEMU
> > > interface is the VF.
> > > 
> > > > Dave
> > > > 
> > > > > > > > > > > An example of this approach can be seen in the way NVIDIA performs
> > > > > > > > > > > live migration of a ConnectX NIC function:
> > > > > > > > > > > 
> > > > > > > > > > > https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci
> > > > > > > > > > > <https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci>
> > > > > > > > > > > 
> > > > > > > > > > > NVIDIAs SNAP technology enables hardware-accelerated software defined
> > > > > > > > > > > PCIe devices. virtio-blk/virtio-net/virtio-fs SNAP used for storage
> > > > > > > > > > > and networking solutions. The host OS/hypervisor uses its standard
> > > > > > > > > > > drivers that are implemented according to a well-known VIRTIO
> > > > > > > > > > > specifications.
> > > > > > > > > > > 
> > > > > > > > > > > In order to implement Live Migration for these virtual function
> > > > > > > > > > > devices, that use a standard drivers as mentioned, the specification
> > > > > > > > > > > should define how HW vendor should build their devices and for SW
> > > > > > > > > > > developers to adjust the drivers.
> > > > > > > > > > > 
> > > > > > > > > > > This will enable specification compliant vendor agnostic solution.
> > > > > > > > > > > 
> > > > > > > > > > > This is exactly how we built the migration driver for ConnectX
> > > > > > > > > > > (internal HW design doc) and I guess that this is the way other
> > > > > > > > > > > vendors work.
> > > > > > > > > > > 
> > > > > > > > > > > For that, I would like to know if the approach of âPF that controls
> > > > > > > > > > > the VF live migration processâ is acceptable by the VIRTIO technical
> > > > > > > > > > > group ?
> > > > > > > > > > > 
> > > > > > > > > > I'm not sure but I think it's better to start from the general
> > > > > > > > > > facility for all transports, then develop features for a specific
> > > > > > > > > > transport.
> > > > > > > > > a general facility for all transports can be a generic admin queue ?
> > > > > > > > It could be a virtqueue or a transport specific method (pcie capability).
> > > > > > > No. You said a general facility for all transports.
> > > > > > For general facility, I mean the chapter 2 of the spec which is general
> > > > > > 
> > > > > > "
> > > > > > 2 Basic Facilities of a Virtio Device
> > > > > > "
> > > > > > 
> > > > > It will be in chapter 2. Right after "2.11 Exporting Object" I can add "2.12
> > > > > Admin Virtqueues" and this is what I did in the RFC.
> > > > > 
> > > > > > > Transport specific is not general.
> > > > > > The transport is in charge of implementing the interface for those facilities.
> > > > > Transport specific is not general.
> > > > > 
> > > > > 
> > > > > > > > E.g we can define what needs to be migrated for the virtio-blk first
> > > > > > > > (the device state). Then we can define the interface to get and set
> > > > > > > > those states via admin virtqueue. Such decoupling may ease the future
> > > > > > > > development of the transport specific migration interface.
> > > > > > > I asked a simple question here.
> > > > > > > 
> > > > > > > Lets stick to this.
> > > > > > I answered this question.
> > > > > No you didn't answer.
> > > > > 
> > > > > I askedÂ if the approach of âPF that controls the VF live migration processâ
> > > > > is acceptable by the VIRTIO technical group ?
> > > > > 
> > > > > And you take the discussion to your direction instead of answering a Yes/No
> > > > > question.
> > > > > 
> > > > > >      The virtqueue could be one of the
> > > > > > approaches. And it's your responsibility to convince the community
> > > > > > about that approach. Having an example may help people to understand
> > > > > > your proposal.
> > > > > > 
> > > > > > > I'm not referring to internal state definitions.
> > > > > > Without an example, how do we know if it can work well?
> > > > > > 
> > > > > > > Can you please not change the subject of my initial intent in the email ?
> > > > > > Did I? Basically, I'm asking how a virtio-blk can be migrated with
> > > > > > your proposal.
> > > > > The virtio-blk PF admin queue will be used to manage the virtio-blk VF
> > > > > migration.
> > > > > 
> > > > > This is the whole discussion. I don't want to get into resolution.
> > > > > 
> > > > > Since you already know the answer as I published 4 RFCs already with all the
> > > > > flow.
> > > > > 
> > > > > Lets stick to my question.
> > > > > 
> > > > > > Thanks
> > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > 
> > > > > > > > Thanks
> > > > > > > > 
> > > > > > > > > > Thanks
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > Cheers,
> > > > > > > > > > > 
> > > > > > > > > > > -Max.
> > > > > > > > > > > 
> > > > > > > > > This publicly archived list offers a means to provide input to the
> > > > > > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > > > > > 
> > > > > > > > > In order to verify user consent to the Feedback License terms and
> > > > > > > > > to minimize spam in the list archive, subscription is required
> > > > > > > > > before posting.
> > > > > > > > > 
> > > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > > > > > List archive: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=gVamhvYG3lbMVVyMz%2F%2Fq3VBMZKY47pqxvRi94Mp%2B%2B7I%3D&amp;reserved=0
> > > > > > > > > Feedback License: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=U%2FxmgTCEaUTSgG%2BLohnAAuNXLncUOKU8yBxhkEMpmQk%3D&amp;reserved=0
> > > > > > > > > List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nXzbdkD4B4TzAFXbD%2B4Jap8rWmzX2CZ8fVnEE2f4Tdc%3D&amp;reserved=0
> > > > > > > > > Committee: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=8Issa0O7V4p6MnuuJOcLDN4MAG77cSMSJ7MSZqvXol4%3D&amp;reserved=0
> > > > > > > > > Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=pPqcruawglqgMjakkslrpSVZzaOu%2FCvfkTSuUfMiEh0%3D&amp;reserved=0
> > > > > > > > > 
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > 
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > > 
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=gVamhvYG3lbMVVyMz%2F%2Fq3VBMZKY47pqxvRi94Mp%2B%2B7I%3D&amp;reserved=0
> > > > > Feedback License: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fwho%2Fipr%2Ffeedback_license.pdf&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=U%2FxmgTCEaUTSgG%2BLohnAAuNXLncUOKU8yBxhkEMpmQk%3D&amp;reserved=0
> > > > > List Guidelines: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fpolicies-guidelines%2Fmailing-lists&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nXzbdkD4B4TzAFXbD%2B4Jap8rWmzX2CZ8fVnEE2f4Tdc%3D&amp;reserved=0
> > > > > Committee: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fcommittees%2Fvirtio%2F&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=8Issa0O7V4p6MnuuJOcLDN4MAG77cSMSJ7MSZqvXol4%3D&amp;reserved=0
> > > > > Join OASIS: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.oasis-open.org%2Fjoin%2F&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7C9d6634c268e84039d18308d9631d0220%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637649798517296420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=pPqcruawglqgMjakkslrpSVZzaOu%2FCvfkTSuUfMiEh0%3D&amp;reserved=0
> > > > > 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
References:
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: Max Gurtovoy <mgurtovoy@nvidia.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: Max Gurtovoy <mgurtovoy@nvidia.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: Jason Wang <jasowang@redhat.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: Max Gurtovoy <mgurtovoy@nvidia.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: Max Gurtovoy <mgurtovoy@nvidia.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
- Re: [virtio-comment] Live Migration of Virtio Virtual Function
  - From: Max Gurtovoy <mgurtovoy@nvidia.com>