[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
On Wed, Oct 18, 2023 at 09:48:55AM +0000, Parav Pandit wrote: > > From: Zhu, Lingshan <lingshan.zhu@intel.com> > > Sent: Wednesday, October 18, 2023 2:13 PM > > > > On 10/18/2023 3:20 PM, Parav Pandit wrote: > > >> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > >> Sent: Wednesday, October 18, 2023 12:22 PM > > >> > > >> On 10/18/2023 2:41 PM, Parav Pandit wrote: > > >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > >>>> Sent: Wednesday, October 18, 2023 12:06 PM > > >>>> > > >>>> On 10/18/2023 1:02 PM, Parav Pandit wrote: > > >>>>>> From: virtio-comment@lists.oasis-open.org > > >>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan > > >>>>>> Sent: Monday, October 16, 2023 3:18 PM > > >>>>>> > > >>>>>> On 10/13/2023 7:54 PM, Parav Pandit wrote: > > >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > > >>>>>>>> Sent: Friday, October 13, 2023 3:14 PM > > >>>>>>>>>>>> How do you transfer the ownership? > > >>>>>>>>>>> An additional ownership deletgation by a new admin command. > > >>>>>>>>>> if you think this can work, do you want to cook a patch to > > >>>>>>>>>> implement this before you submitting this live migration series? > > >>>>>>>>> I answered this already above. > > >>>>>>>> talk is cheap, show me your patch > > >>>>>>> Huh. We presented the infrastructure that migrates, 30+ device > > >>>>>>> types, > > >>>>>> covering device context ideas from Oracle. > > >>>>>>> Covering P2P, supporting device_reset, FLR, dirty page tracking. > > >>>>>>> > > >>>>>>> Please have some respect for other members who covered more > > >>>>>>> ground than > > >>>>>> your series. > > >>>>>>> What more? Apply the same nested concept on the member device as > > >>>>>> Michael suggested, it is nested virtualization maintain exact > > >>>>>> same > > >> semantics. > > >>>>>>> So a VF is mapped as PF to the L1 guest. > > >>>>>>> L1 guest can enable SR-IOV on it, and map one VF to L2 guest. > > >>>>>>> > > >>>>>>> This nested work can be extended in future, once first level > > >>>>>>> nesting is > > >>>>>> covered. > > >>>>>>>> Answer all questions above, if you think a management VF can > > >>>>>>>> work, please show me your patch. > > >>>>>>> The idea evolves from technical debate then pointing fingers > > >>>>>>> like your > > >>>>>> comment. > > >>>>>>> I think a positive discussion with Michael and a pointer to the > > >>>>>>> paper from > > >>>>>> Jason gave a good direction of doing _right_ nesting that follows > > >>>>>> two > > >>>> principles. > > >>>>>>> a. efficiency property > > >>>>>>> b. equivalence property > > >>>>>>> > > >>>>>>> (c. resource control is natural already) > > >>>>>>> > > >>>>>>> Both apply at VMM and at VM level enabling recursive > > >>>>>>> virtualization, by > > >>>>>> having VF that can act as PF inside the guest. > > >>>>>>> [1] https://dl.acm.org/doi/pdf/10.1145/361011.361073 > > >>>>>> Please just show me your patch resolving these opens, how about > > >>>>>> start from defining virito-fs device context and your management VF? > > >>>>> As answered, device context infrastructure is done, per device > > >>>>> specific device- > > >>>> context will be defined incrementally. > > >>>>> I will not be including virtio-fs in this series. It will be done > > >>>>> incrementally in > > >>>> future utilizing the infrastructure build in this series. > > >>>> Done? How do you conclude this? You just tell me what is the full > > >>>> set of virito-fs device context now and how to migrate them. > > >>>> > > >>>> You cant? you refuse or you don't? Do you expect the HW designer to > > >>>> figure out by themself? > > >>> I wont be able to tell now as I donât think it is necessary for this series. > > >>> If one out of 30 devices cannot migrate because of unimaginable > > >>> amount of > > >> complexity has been placed there, may be one will not implement it as > > >> member device. > > >>> From experience of migratable complex gpu devices, rdma devices > > >>> (stateful > > >> having hundred thousand of stateful QPs), my understanding is complex > > >> state of virtio-fs can be defined and migratable. > > >>> Mlx5 driver consist of 150,000 lines of code and that device is > > >>> migratable > > >> with complex state. > > >>> So I am optimistic that virtio-fs can be migratable too. > > >>> It does not have to limited by my limited creativity of 2023. > > >>> May be I am wrong, in that case one will not implement passthrough > > >>> virtio-fs > > >> device. > > >> your series wants to migrate device context, but doesn't define > > >> device context, does this sounds reasonable? > > > Device generic context is defined at [1] and also the infrastructure for defining > > the device context in parallel by multiple people can be done post the work of > > [1]. > > > > > > Per each device type context will be defined incrementally post this work. > > > > > > [1] > > > https://lists.oasis-open.org/archives/virtio-comment/202310/msg00190.h > > > tml > > This is not post of the work, you should define them before you use them in this > > series. > > > I donât agree to cook ocean in this patch series. > No practical spec devel community does it. > As long as we feel comfortable that device context framework is extendible, it is fine. > If virtio-fs seems very hard, may be one will come with a new light weight FS device. I really donât know. > > > And you need to prove why admin vq are better than registers solution if you > > want a merge. > Michael already responded the practical aspects. > Since you may claim, I didnât answer, below is the technical details. > > Why admin commands and aq is better is because of below reasons in my view: > > Functionally better: > 1. When the live migration registers are located on the VF itself, VMM does not have control of it. > These registers reset, on FLR and device reset because these are virtio registers of the device. > Hence, VMM lost the state for the job that VMM was supposed to do. > Therefore, passthrough mode cannot depend on these registers. > > 2. Any bulk data transfer of device context and dirty page tracking requires DMA. > Hence those DMA must happen to the device which is different than VF itself. > If it is on the VF itself, it has two problems. > 2.a. VF device reset and FLR will clear them, and device context is lost. > > 2.b. the DMA occurs at the PCI RID level. > IOMMU cannot bifurcate the DMA of one RID to two different address space of guest and hypervisor. > This requires PASID support. > Using PASID has following problems. > 2.b.1 PASID typically not used by the kernel software. It is only meant for the user processes. > Hence for kernel work a reserving PASID won't be acceptable upstream kernel. > 2.b.2 Somehow if this is done, When the VF itself supports PASID, it required now vPASID support. > This is again not where industry is going in other forums where I am part of. Hence, it will be failure for virtio. Hence, I do not recommend vPASID route. > 2.b.3 One of the widely used cpu seems to have dropped the support due to limitation of an instruction around PASID. > So it cannot be used there, this further limits virtio passthrough users. > > Even if somehow 2.b.2 and 2.b.3 is overcome in theory, #1 and 2.a is functional problems. > > Scale wise better: > 3. Admin command and admin vq are used _only_ when one does device migration command. > One does not migrate VMs every few msec. > Hence such functionality to be better be done which is efficient for performance, but without consuming on-chip memory. > Admin command and admin vq satisfy those. > > 4. Once the software matures further, admin command would prefer completion interrupt, instead of poll. > How to get notification/interrupt? Well, virtqueue defines this already. > Should we replicate that in some PF registers? > It can be. But once you put all the functionalities of admin command and aq in registers the whole thing becomes yet another register_q. > > 5. Can these registers be placed in the PF to overcome #1 and #2 for passthrough? > In theory yes. > In practice, no, as there are many commands that flow, which needs to scale to reasonable number of VFs. > Admin commands over admin vq provides this generic facility. > > 6. Most modern devices who attempts to scale, cut down their register footprint, registers are used only for main bootstap, init time config work. > Even in virtio spec, one can read: > "Device configuration space is generally used for rarely changing or initialization-time parameters." > > Adding some additional registers to a PF device config space for non init time parameters does not make sense. > > 7. Additionally, a nested virtualization should be done by truly nesting the device at right abstraction point of owner-member relationship. > This follows two principles of (a) efficiency and (b) equivalency of what Jason paper pointed. > And we ask for nested VF extension we will get our guidance from PCI-SIG, of why it should be done if it is matching with rest of the ecosystem components that support/donât support the nesting. For completeness, and to shorten the thread, can you please list known issues/use cases that are addressed by the status bit interface and how you plan for them to be addressed?
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]