OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Thursday, October 19, 2023 1:45 PM
> 
> On 10/18/2023 5:48 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Wednesday, October 18, 2023 2:13 PM
> >>
> >> On 10/18/2023 3:20 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Wednesday, October 18, 2023 12:22 PM
> >>>>
> >>>> On 10/18/2023 2:41 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Wednesday, October 18, 2023 12:06 PM
> >>>>>>
> >>>>>> On 10/18/2023 1:02 PM, Parav Pandit wrote:
> >>>>>>>> From: virtio-comment@lists.oasis-open.org
> >>>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
> >>>>>>>> Lingshan
> >>>>>>>> Sent: Monday, October 16, 2023 3:18 PM
> >>>>>>>>
> >>>>>>>> On 10/13/2023 7:54 PM, Parav Pandit wrote:
> >>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>>>>>> Sent: Friday, October 13, 2023 3:14 PM
> >>>>>>>>>>>>>> How do you transfer the ownership?
> >>>>>>>>>>>>> An additional ownership deletgation by a new admin
> command.
> >>>>>>>>>>>> if you think this can work, do you want to cook a patch to
> >>>>>>>>>>>> implement this before you submitting this live migration series?
> >>>>>>>>>>> I answered this already above.
> >>>>>>>>>> talk is cheap, show me your patch
> >>>>>>>>> Huh. We presented the infrastructure that migrates, 30+ device
> >>>>>>>>> types,
> >>>>>>>> covering device context ideas from Oracle.
> >>>>>>>>> Covering P2P, supporting device_reset, FLR, dirty page tracking.
> >>>>>>>>>
> >>>>>>>>> Please have some respect for other members who covered more
> >>>>>>>>> ground than
> >>>>>>>> your series.
> >>>>>>>>> What more? Apply the same nested concept on the member device
> >>>>>>>>> as
> >>>>>>>> Michael suggested, it is nested virtualization maintain exact
> >>>>>>>> same
> >>>> semantics.
> >>>>>>>>> So a VF is mapped as PF to the L1 guest.
> >>>>>>>>> L1 guest can enable SR-IOV on it, and map one VF to L2 guest.
> >>>>>>>>>
> >>>>>>>>> This nested work can be extended in future, once first level
> >>>>>>>>> nesting is
> >>>>>>>> covered.
> >>>>>>>>>> Answer all questions above, if you think a management VF can
> >>>>>>>>>> work, please show me your patch.
> >>>>>>>>> The idea evolves from technical debate then pointing fingers
> >>>>>>>>> like your
> >>>>>>>> comment.
> >>>>>>>>> I think a positive discussion with Michael and a pointer to
> >>>>>>>>> the paper from
> >>>>>>>> Jason gave a good direction of doing _right_ nesting that
> >>>>>>>> follows two
> >>>>>> principles.
> >>>>>>>>> a. efficiency property
> >>>>>>>>> b. equivalence property
> >>>>>>>>>
> >>>>>>>>> (c. resource control is natural already)
> >>>>>>>>>
> >>>>>>>>> Both apply at VMM and at VM level enabling recursive
> >>>>>>>>> virtualization, by
> >>>>>>>> having VF that can act as PF inside the guest.
> >>>>>>>>> [1] https://dl.acm.org/doi/pdf/10.1145/361011.361073
> >>>>>>>> Please just show me your patch resolving these opens, how about
> >>>>>>>> start from defining virito-fs device context and your management VF?
> >>>>>>> As answered, device context infrastructure is done, per device
> >>>>>>> specific device-
> >>>>>> context will be defined incrementally.
> >>>>>>> I will not be including virtio-fs in this series. It will be
> >>>>>>> done incrementally in
> >>>>>> future utilizing the infrastructure build in this series.
> >>>>>> Done? How do you conclude this? You just tell me what is the full
> >>>>>> set of virito-fs device context now and how to migrate them.
> >>>>>>
> >>>>>> You cant? you refuse or you don't? Do you expect the HW designer
> >>>>>> to figure out by themself?
> >>>>> I wont be able to tell now as I donât think it is necessary for this series.
> >>>>> If one out of 30 devices cannot migrate because of unimaginable
> >>>>> amount of
> >>>> complexity has been placed there, may be one will not implement it
> >>>> as member device.
> >>>>>    From experience of migratable complex gpu devices, rdma devices
> >>>>> (stateful
> >>>> having hundred thousand of stateful QPs), my understanding is
> >>>> complex state of virtio-fs can be defined and migratable.
> >>>>> Mlx5 driver consist of 150,000 lines of code and that device is
> >>>>> migratable
> >>>> with complex state.
> >>>>> So I am optimistic that virtio-fs can be migratable too.
> >>>>> It does not have to limited by my limited creativity of 2023.
> >>>>> May be I am wrong, in that case one will not implement passthrough
> >>>>> virtio-fs
> >>>> device.
> >>>> your series wants to migrate device context, but doesn't define
> >>>> device context, does this sounds reasonable?
> >>> Device generic context is defined at [1] and also the infrastructure
> >>> for defining
> >> the device context in parallel by multiple people can be done post
> >> the work of [1].
> >>> Per each device type context will be defined incrementally post this work.
> >>>
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202310/msg00190
> >>> .h
> >>> tml
> >> This is not post of the work, you should define them before you use
> >> them in this series.
> >>
> > I donât agree to cook ocean in this patch series.
> > No practical spec devel community does it.
> > As long as we feel comfortable that device context framework is extendible, it
> is fine.
> > If virtio-fs seems very hard, may be one will come with a new light weight FS
> device. I really donât know.
> so you want to migrate device context, but refuse to define them?
> >
> >> And you need to prove why admin vq are better than registers solution
> >> if you want a merge.
> > Michael already responded the practical aspects.
> > Since you may claim, I didnât answer, below is the technical details.
> >
> > Why admin commands and aq is better is because of below reasons in my
> view:
> >
> > Functionally better:
> > 1. When the live migration registers are located on the VF itself, VMM does
> not have control of it.
> > These registers reset, on FLR and device reset because these are virtio
> registers of the device.
> > Hence, VMM lost the state for the job that VMM was supposed to do.
> > Therefore, passthrough mode cannot depend on these registers.
> >
> > 2. Any bulk data transfer of device context and dirty page tracking requires
> DMA.
> > Hence those DMA must happen to the device which is different than VF itself.
> > If it is on the VF itself, it has two problems.
> > 2.a. VF device reset and FLR will clear them, and device context is lost.
> >
> > 2.b. the DMA occurs at the PCI RID level.
> > IOMMU cannot bifurcate the DMA of one RID to two different address space
> of guest and hypervisor.
> > This requires PASID support.
> > Using PASID has following problems.
> > 2.b.1 PASID typically not used by the kernel software. It is only meant for the
> user processes.
> > Hence for kernel work a reserving PASID won't be acceptable upstream
> kernel.
> > 2.b.2 Somehow if this is done, When the VF itself supports PASID, it required
> now vPASID support.
> > This is again not where industry is going in other forums where I am part of.
> Hence, it will be failure for virtio. Hence, I do not recommend vPASID route.
> > 2.b.3 One of the widely used cpu seems to have dropped the support due to
> limitation of an instruction around PASID.
> > So it cannot be used there, this further limits virtio passthrough users.
> >
> > Even if somehow 2.b.2 and 2.b.3 is overcome in theory, #1 and 2.a is
> functional problems.
> >
> > Scale wise better:
> > 3. Admin command and admin vq are used _only_ when one does device
> migration command.
> > One does not migrate VMs every few msec.
> > Hence such functionality to be better be done which is efficient for
> performance, but without consuming on-chip memory.
> > Admin command and admin vq satisfy those.
> >
> > 4. Once the software matures further, admin command would prefer
> completion interrupt, instead of poll.
> > How to get notification/interrupt? Well, virtqueue defines this already.
> > Should we replicate that in some PF registers?
> > It can be. But once you put all the functionalities of admin command and aq
> in registers the whole thing becomes yet another register_q.
> >
> > 5. Can these registers be placed in the PF to overcome #1 and #2 for
> passthrough?
> > In theory yes.
> > In practice, no, as there are many commands that flow, which needs to scale
> to reasonable number of VFs.
> > Admin commands over admin vq provides this generic facility.
> >
> > 6. Most modern devices who attempts to scale, cut down their register
> footprint, registers are used only for main bootstap, init time config work.
> > Even in virtio spec, one can read:
> > "Device configuration space is generally used for rarely changing or
> initialization-time parameters."
> >
> > Adding some additional registers to a PF device config space for non init time
> parameters does not make sense.
> >
> > 7. Additionally, a nested virtualization should be done by truly nesting the
> device at right abstraction point of owner-member relationship.
> > This follows two principles of (a) efficiency and (b) equivalency of what Jason
> paper pointed.
> > And we ask for nested VF extension we will get our guidance from PCI-SIG, of
> why it should be done if it is matching with rest of the ecosystem components
> that support/donât support the nesting.
> It they are true, shall we refactor virtio-pci common cfg functionalities to use
> admin vq?
For non-backward compatible SIOV device of the future, yes, virtio-pci common config (non init registers) should be moved to a vq, located on the member device directly.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]