[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
> From: Zhu, Lingshan <lingshan.zhu@intel.com> > Sent: Thursday, October 19, 2023 2:40 PM > > On 10/19/2023 5:01 PM, Parav Pandit wrote: > >> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >> Sent: Thursday, October 19, 2023 1:45 PM > >> > >> On 10/18/2023 5:48 PM, Parav Pandit wrote: > >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>> Sent: Wednesday, October 18, 2023 2:13 PM > >>>> > >>>> On 10/18/2023 3:20 PM, Parav Pandit wrote: > >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>>>> Sent: Wednesday, October 18, 2023 12:22 PM > >>>>>> > >>>>>> On 10/18/2023 2:41 PM, Parav Pandit wrote: > >>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>>>>>> Sent: Wednesday, October 18, 2023 12:06 PM > >>>>>>>> > >>>>>>>> On 10/18/2023 1:02 PM, Parav Pandit wrote: > >>>>>>>>>> From: virtio-comment@lists.oasis-open.org > >>>>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, > >>>>>>>>>> Lingshan > >>>>>>>>>> Sent: Monday, October 16, 2023 3:18 PM > >>>>>>>>>> > >>>>>>>>>> On 10/13/2023 7:54 PM, Parav Pandit wrote: > >>>>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com> > >>>>>>>>>>>> Sent: Friday, October 13, 2023 3:14 PM > >>>>>>>>>>>>>>>> How do you transfer the ownership? > >>>>>>>>>>>>>>> An additional ownership deletgation by a new admin > >> command. > >>>>>>>>>>>>>> if you think this can work, do you want to cook a patch > >>>>>>>>>>>>>> to implement this before you submitting this live migration > series? > >>>>>>>>>>>>> I answered this already above. > >>>>>>>>>>>> talk is cheap, show me your patch > >>>>>>>>>>> Huh. We presented the infrastructure that migrates, 30+ > >>>>>>>>>>> device types, > >>>>>>>>>> covering device context ideas from Oracle. > >>>>>>>>>>> Covering P2P, supporting device_reset, FLR, dirty page tracking. > >>>>>>>>>>> > >>>>>>>>>>> Please have some respect for other members who covered more > >>>>>>>>>>> ground than > >>>>>>>>>> your series. > >>>>>>>>>>> What more? Apply the same nested concept on the member > >>>>>>>>>>> device as > >>>>>>>>>> Michael suggested, it is nested virtualization maintain exact > >>>>>>>>>> same > >>>>>> semantics. > >>>>>>>>>>> So a VF is mapped as PF to the L1 guest. > >>>>>>>>>>> L1 guest can enable SR-IOV on it, and map one VF to L2 guest. > >>>>>>>>>>> > >>>>>>>>>>> This nested work can be extended in future, once first level > >>>>>>>>>>> nesting is > >>>>>>>>>> covered. > >>>>>>>>>>>> Answer all questions above, if you think a management VF > >>>>>>>>>>>> can work, please show me your patch. > >>>>>>>>>>> The idea evolves from technical debate then pointing fingers > >>>>>>>>>>> like your > >>>>>>>>>> comment. > >>>>>>>>>>> I think a positive discussion with Michael and a pointer to > >>>>>>>>>>> the paper from > >>>>>>>>>> Jason gave a good direction of doing _right_ nesting that > >>>>>>>>>> follows two > >>>>>>>> principles. > >>>>>>>>>>> a. efficiency property > >>>>>>>>>>> b. equivalence property > >>>>>>>>>>> > >>>>>>>>>>> (c. resource control is natural already) > >>>>>>>>>>> > >>>>>>>>>>> Both apply at VMM and at VM level enabling recursive > >>>>>>>>>>> virtualization, by > >>>>>>>>>> having VF that can act as PF inside the guest. > >>>>>>>>>>> [1] https://dl.acm.org/doi/pdf/10.1145/361011.361073 > >>>>>>>>>> Please just show me your patch resolving these opens, how > >>>>>>>>>> about start from defining virito-fs device context and your > management VF? > >>>>>>>>> As answered, device context infrastructure is done, per device > >>>>>>>>> specific device- > >>>>>>>> context will be defined incrementally. > >>>>>>>>> I will not be including virtio-fs in this series. It will be > >>>>>>>>> done incrementally in > >>>>>>>> future utilizing the infrastructure build in this series. > >>>>>>>> Done? How do you conclude this? You just tell me what is the > >>>>>>>> full set of virito-fs device context now and how to migrate them. > >>>>>>>> > >>>>>>>> You cant? you refuse or you don't? Do you expect the HW > >>>>>>>> designer to figure out by themself? > >>>>>>> I wont be able to tell now as I donât think it is necessary for this series. > >>>>>>> If one out of 30 devices cannot migrate because of unimaginable > >>>>>>> amount of > >>>>>> complexity has been placed there, may be one will not implement > >>>>>> it as member device. > >>>>>>> From experience of migratable complex gpu devices, rdma > >>>>>>> devices (stateful > >>>>>> having hundred thousand of stateful QPs), my understanding is > >>>>>> complex state of virtio-fs can be defined and migratable. > >>>>>>> Mlx5 driver consist of 150,000 lines of code and that device is > >>>>>>> migratable > >>>>>> with complex state. > >>>>>>> So I am optimistic that virtio-fs can be migratable too. > >>>>>>> It does not have to limited by my limited creativity of 2023. > >>>>>>> May be I am wrong, in that case one will not implement > >>>>>>> passthrough virtio-fs > >>>>>> device. > >>>>>> your series wants to migrate device context, but doesn't define > >>>>>> device context, does this sounds reasonable? > >>>>> Device generic context is defined at [1] and also the > >>>>> infrastructure for defining > >>>> the device context in parallel by multiple people can be done post > >>>> the work of [1]. > >>>>> Per each device type context will be defined incrementally post this > work. > >>>>> > >>>>> [1] > >>>>> https://lists.oasis-open.org/archives/virtio-comment/202310/msg001 > >>>>> 90 > >>>>> .h > >>>>> tml > >>>> This is not post of the work, you should define them before you use > >>>> them in this series. > >>>> > >>> I donât agree to cook ocean in this patch series. > >>> No practical spec devel community does it. > >>> As long as we feel comfortable that device context framework is > >>> extendible, it > >> is fine. > >>> If virtio-fs seems very hard, may be one will come with a new light > >>> weight FS > >> device. I really donât know. > >> so you want to migrate device context, but refuse to define them? > >>>> And you need to prove why admin vq are better than registers > >>>> solution if you want a merge. > >>> Michael already responded the practical aspects. > >>> Since you may claim, I didnât answer, below is the technical details. > >>> > >>> Why admin commands and aq is better is because of below reasons in > >>> my > >> view: > >>> Functionally better: > >>> 1. When the live migration registers are located on the VF itself, > >>> VMM does > >> not have control of it. > >>> These registers reset, on FLR and device reset because these are > >>> virtio > >> registers of the device. > >>> Hence, VMM lost the state for the job that VMM was supposed to do. > >>> Therefore, passthrough mode cannot depend on these registers. > >>> > >>> 2. Any bulk data transfer of device context and dirty page tracking > >>> requires > >> DMA. > >>> Hence those DMA must happen to the device which is different than VF > itself. > >>> If it is on the VF itself, it has two problems. > >>> 2.a. VF device reset and FLR will clear them, and device context is lost. > >>> > >>> 2.b. the DMA occurs at the PCI RID level. > >>> IOMMU cannot bifurcate the DMA of one RID to two different address > >>> space > >> of guest and hypervisor. > >>> This requires PASID support. > >>> Using PASID has following problems. > >>> 2.b.1 PASID typically not used by the kernel software. It is only > >>> meant for the > >> user processes. > >>> Hence for kernel work a reserving PASID won't be acceptable upstream > >> kernel. > >>> 2.b.2 Somehow if this is done, When the VF itself supports PASID, it > >>> required > >> now vPASID support. > >>> This is again not where industry is going in other forums where I am part of. > >> Hence, it will be failure for virtio. Hence, I do not recommend vPASID route. > >>> 2.b.3 One of the widely used cpu seems to have dropped the support > >>> due to > >> limitation of an instruction around PASID. > >>> So it cannot be used there, this further limits virtio passthrough users. > >>> > >>> Even if somehow 2.b.2 and 2.b.3 is overcome in theory, #1 and 2.a is > >> functional problems. > >>> Scale wise better: > >>> 3. Admin command and admin vq are used _only_ when one does device > >> migration command. > >>> One does not migrate VMs every few msec. > >>> Hence such functionality to be better be done which is efficient for > >> performance, but without consuming on-chip memory. > >>> Admin command and admin vq satisfy those. > >>> > >>> 4. Once the software matures further, admin command would prefer > >> completion interrupt, instead of poll. > >>> How to get notification/interrupt? Well, virtqueue defines this already. > >>> Should we replicate that in some PF registers? > >>> It can be. But once you put all the functionalities of admin command > >>> and aq > >> in registers the whole thing becomes yet another register_q. > >>> 5. Can these registers be placed in the PF to overcome #1 and #2 for > >> passthrough? > >>> In theory yes. > >>> In practice, no, as there are many commands that flow, which needs > >>> to scale > >> to reasonable number of VFs. > >>> Admin commands over admin vq provides this generic facility. > >>> > >>> 6. Most modern devices who attempts to scale, cut down their > >>> register > >> footprint, registers are used only for main bootstap, init time config work. > >>> Even in virtio spec, one can read: > >>> "Device configuration space is generally used for rarely changing or > >> initialization-time parameters." > >>> Adding some additional registers to a PF device config space for non > >>> init time > >> parameters does not make sense. > >>> 7. Additionally, a nested virtualization should be done by truly > >>> nesting the > >> device at right abstraction point of owner-member relationship. > >>> This follows two principles of (a) efficiency and (b) equivalency of > >>> what Jason > >> paper pointed. > >>> And we ask for nested VF extension we will get our guidance from > >>> PCI-SIG, of > >> why it should be done if it is matching with rest of the ecosystem > >> components that support/donât support the nesting. > >> It they are true, shall we refactor virtio-pci common cfg > >> functionalities to use admin vq? > > For non-backward compatible SIOV device of the future, yes, virtio-pci > common config (non init registers) should be moved to a vq, located on the > member device directly. > Oh, really? Quite interesting, do you want to move all config space fields in VF > to admin vq? Have a plan? Not in my plan for spec 1.4 time frame. I do not want to divert the discussion, would like to focus on device migration phases. Lets please discuss in some other dedicated thread.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]