[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
On 10/19/2023 5:13 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Thursday, October 19, 2023 2:40 PM On 10/19/2023 5:01 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Thursday, October 19, 2023 1:45 PM On 10/18/2023 5:48 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Wednesday, October 18, 2023 2:13 PM On 10/18/2023 3:20 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Wednesday, October 18, 2023 12:22 PM On 10/18/2023 2:41 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Wednesday, October 18, 2023 12:06 PM On 10/18/2023 1:02 PM, Parav Pandit wrote:From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan Sent: Monday, October 16, 2023 3:18 PM On 10/13/2023 7:54 PM, Parav Pandit wrote:From: Zhu, Lingshan <lingshan.zhu@intel.com> Sent: Friday, October 13, 2023 3:14 PMHow do you transfer the ownership?An additional ownership deletgation by a new admincommand.if you think this can work, do you want to cook a patch to implement this before you submitting this live migrationseries?I answered this already above.talk is cheap, show me your patchHuh. We presented the infrastructure that migrates, 30+ device types,covering device context ideas from Oracle.Covering P2P, supporting device_reset, FLR, dirty page tracking. Please have some respect for other members who covered more ground thanyour series.What more? Apply the same nested concept on the member device asMichael suggested, it is nested virtualization maintain exact samesemantics.So a VF is mapped as PF to the L1 guest. L1 guest can enable SR-IOV on it, and map one VF to L2 guest. This nested work can be extended in future, once first level nesting iscovered.Answer all questions above, if you think a management VF can work, please show me your patch.The idea evolves from technical debate then pointing fingers like yourcomment.I think a positive discussion with Michael and a pointer to the paper fromJason gave a good direction of doing _right_ nesting that follows twoprinciples.a. efficiency property b. equivalence property (c. resource control is natural already) Both apply at VMM and at VM level enabling recursive virtualization, byhaving VF that can act as PF inside the guest.[1] https://dl.acm.org/doi/pdf/10.1145/361011.361073Please just show me your patch resolving these opens, how about start from defining virito-fs device context and yourmanagement VF?As answered, device context infrastructure is done, per device specific device-context will be defined incrementally.I will not be including virtio-fs in this series. It will be done incrementally infuture utilizing the infrastructure build in this series. Done? How do you conclude this? You just tell me what is the full set of virito-fs device context now and how to migrate them. You cant? you refuse or you don't? Do you expect the HW designer to figure out by themself?I wont be able to tell now as I donât think it is necessary for this series. If one out of 30 devices cannot migrate because of unimaginable amount ofcomplexity has been placed there, may be one will not implement it as member device.From experience of migratable complex gpu devices, rdma devices (statefulhaving hundred thousand of stateful QPs), my understanding is complex state of virtio-fs can be defined and migratable.Mlx5 driver consist of 150,000 lines of code and that device is migratablewith complex state.So I am optimistic that virtio-fs can be migratable too. It does not have to limited by my limited creativity of 2023. May be I am wrong, in that case one will not implement passthrough virtio-fsdevice. your series wants to migrate device context, but doesn't define device context, does this sounds reasonable?Device generic context is defined at [1] and also the infrastructure for definingthe device context in parallel by multiple people can be done post the work of [1].Per each device type context will be defined incrementally post thiswork.[1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg001 90 .h tmlThis is not post of the work, you should define them before you use them in this series.I donât agree to cook ocean in this patch series. No practical spec devel community does it. As long as we feel comfortable that device context framework is extendible, itis fine.If virtio-fs seems very hard, may be one will come with a new light weight FSdevice. I really donât know. so you want to migrate device context, but refuse to define them?And you need to prove why admin vq are better than registers solution if you want a merge.Michael already responded the practical aspects. Since you may claim, I didnât answer, below is the technical details. Why admin commands and aq is better is because of below reasons in myview:Functionally better: 1. When the live migration registers are located on the VF itself, VMM doesnot have control of it.These registers reset, on FLR and device reset because these are virtioregisters of the device.Hence, VMM lost the state for the job that VMM was supposed to do. Therefore, passthrough mode cannot depend on these registers. 2. Any bulk data transfer of device context and dirty page tracking requiresDMA.Hence those DMA must happen to the device which is different than VFitself.If it is on the VF itself, it has two problems. 2.a. VF device reset and FLR will clear them, and device context is lost. 2.b. the DMA occurs at the PCI RID level. IOMMU cannot bifurcate the DMA of one RID to two different address spaceof guest and hypervisor.This requires PASID support. Using PASID has following problems. 2.b.1 PASID typically not used by the kernel software. It is only meant for theuser processes.Hence for kernel work a reserving PASID won't be acceptable upstreamkernel.2.b.2 Somehow if this is done, When the VF itself supports PASID, it requirednow vPASID support.This is again not where industry is going in other forums where I am part of.Hence, it will be failure for virtio. Hence, I do not recommend vPASID route.2.b.3 One of the widely used cpu seems to have dropped the support due tolimitation of an instruction around PASID.So it cannot be used there, this further limits virtio passthrough users. Even if somehow 2.b.2 and 2.b.3 is overcome in theory, #1 and 2.a isfunctional problems.Scale wise better: 3. Admin command and admin vq are used _only_ when one does devicemigration command.One does not migrate VMs every few msec. Hence such functionality to be better be done which is efficient forperformance, but without consuming on-chip memory.Admin command and admin vq satisfy those. 4. Once the software matures further, admin command would prefercompletion interrupt, instead of poll.How to get notification/interrupt? Well, virtqueue defines this already. Should we replicate that in some PF registers? It can be. But once you put all the functionalities of admin command and aqin registers the whole thing becomes yet another register_q.5. Can these registers be placed in the PF to overcome #1 and #2 forpassthrough?In theory yes. In practice, no, as there are many commands that flow, which needs to scaleto reasonable number of VFs.Admin commands over admin vq provides this generic facility. 6. Most modern devices who attempts to scale, cut down their registerfootprint, registers are used only for main bootstap, init time config work.Even in virtio spec, one can read: "Device configuration space is generally used for rarely changing orinitialization-time parameters."Adding some additional registers to a PF device config space for non init timeparameters does not make sense.7. Additionally, a nested virtualization should be done by truly nesting thedevice at right abstraction point of owner-member relationship.This follows two principles of (a) efficiency and (b) equivalency of what Jasonpaper pointed.And we ask for nested VF extension we will get our guidance from PCI-SIG, ofwhy it should be done if it is matching with rest of the ecosystem components that support/donât support the nesting. It they are true, shall we refactor virtio-pci common cfg functionalities to use admin vq?For non-backward compatible SIOV device of the future, yes, virtio-pcicommon config (non init registers) should be moved to a vq, located on the member device directly. Oh, really? Quite interesting, do you want to move all config space fields in VF to admin vq? Have a plan?Not in my plan for spec 1.4 time frame. I do not want to divert the discussion, would like to focus on device migration phases. Lets please discuss in some other dedicated thread.
OK, but don't say admin vq is better than registers.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]