[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [virtio-comment] [PATCH v1 1/8] admin: Add theory of operation for device migration
> From: Jason Wang <jasowang@redhat.com> > Sent: Monday, November 6, 2023 12:05 PM > > On Thu, Nov 2, 2023 at 2:10âPM Parav Pandit <parav@nvidia.com> wrote: > > > > > > > From: Jason Wang <jasowang@redhat.com> > > > Sent: Thursday, November 2, 2023 9:56 AM > > > > > > On Wed, Nov 1, 2023 at 11:32âAM Parav Pandit <parav@nvidia.com> wrote: > > > > > > > > > > > > > > > > > From: Jason Wang <jasowang@redhat.com> > > > > > Sent: Wednesday, November 1, 2023 6:04 AM > > > > > > > > > > On Tue, Oct 31, 2023 at 1:30âPM Parav Pandit <parav@nvidia.com> > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > From: Jason Wang <jasowang@redhat.com> > > > > > > > Sent: Tuesday, October 31, 2023 7:05 AM > > > > > > > > > > > > > > On Mon, Oct 30, 2023 at 12:47âPM Parav Pandit > > > > > > > <parav@nvidia.com> > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: virtio-comment@lists.oasis-open.org > > > > > > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of > > > > > > > > > Jason Wang > > > > > > > > > > > > > > > > > > On Thu, Oct 26, 2023 at 11:45âAM Parav Pandit > > > > > > > > > <parav@nvidia.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Jason Wang <jasowang@redhat.com> > > > > > > > > > > > Sent: Thursday, October 26, 2023 6:16 AM > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 25, 2023 at 3:03âPM Parav Pandit > > > > > > > > > > > <parav@nvidia.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Jason Wang <jasowang@redhat.com> > > > > > > > > > > > > > Sent: Wednesday, October 25, 2023 6:59 AM > > > > > > > > > > > > > > For passthrough PASID assignment vq is not needed. > > > > > > > > > > > > > > > > > > > > > > > > > > How do you know that? > > > > > > > > > > > > Because for passthrough, the hypervisor is not > > > > > > > > > > > > involved in dealing with VQ at > > > > > > > > > > > all. > > > > > > > > > > > > > > > > > > > > > > Ok, so if I understand correctly, you are saying > > > > > > > > > > > your design can't work for the case of PASID assignment. > > > > > > > > > > > > > > > > > > > > > No. PASID assignment will happen from the guest for > > > > > > > > > > its own use and device > > > > > > > > > migration will just work fine because device context will capture > this. > > > > > > > > > > > > > > > > > > It's not about device context. We're discussing "passthrough", > no? > > > > > > > > > > > > > > > > > Not sure, we are discussing same. > > > > > > > > A member device is passthrough to the guest, dealing with > > > > > > > > its own PASIDs and > > > > > > > virtio interface for some VQ assignment to PASID. > > > > > > > > So VQ context captured by the hypervisor, will have some > > > > > > > > PASID attached to > > > > > > > this VQ. > > > > > > > > Device context will be updated. > > > > > > > > > > > > > > > > > You want all virtio stuff to be "passthrough", but > > > > > > > > > assigning a PASID to a specific virtqueue in the guest must be > trapped. > > > > > > > > > > > > > > > > > No. PASID assignment to a specific virtqueue in the guest > > > > > > > > must go directly > > > > > > > from guest to device. > > > > > > > > > > > > > > This works like setting CR3, you can't simply let it go from guest to > host. > > > > > > > > > > > > > > Host IOMMU driver needs to know the PASID to program the IO > > > > > > > page tables correctly. > > > > > > > > > > > > > This will be done by the IOMMU. > > > > > > > > > > > > > > When guest iommu may need to communicate anything for this > > > > > > > > PASID, it will > > > > > > > come through its proper IOMMU channel/hypercall. > > > > > > > > > > > > > > Let's say using PASID X for queue 0, this knowledge is > > > > > > > beyond the IOMMU scope but belongs to virtio. Or please > > > > > > > explain how it can work when it goes directly from guest to device. > > > > > > > > > > > > > We are yet to ever see spec for PASID to VQ assignment. > > > > > > > > > > It has one. > > > > > > > > > > > For ok for theory sake it is there. > > > > > > > > > > > > Virtio driver will assign the PASID directly from guest driver > > > > > > to device using a > > > > > create_vq(pasid=X) command. > > > > > > Same process is somehow attached the PASID by the guest OS. > > > > > > The whole PASID range is known to the hypervisor when the > > > > > > device is handed > > > > > over to the guest VM. > > > > > > > > > > How can it know? > > > > > > > > > > > So PASID mapping is setup by the hypervisor IOMMU at this point. > > > > > > > > > > You disallow the PASID to be virtualized here. What's more, such > > > > > a PASID passthrough has security implications. > > > > > > > > > No. virtio spec is not disallowing. At least for sure, this series is not the > one. > > > > My main point is, virtio device interface will not be the source > > > > of hypercall to > > > program IOMMU in the hypervisor. > > > > It is something to be done by IOMMU side. > > > > > > So unless vPASID can be used by the hardware you need to trap the > > > mapping from a PASID to a virtqueue. Then you need virtio specific > knowledge. > > > > > vPASID by hardware is unlikely to be used by hw PCI EP devices at least in any > near term future. > > This requires either vPASID to pPASID table in device or in IOMMU. > > So we are on the same page. > > Claiming a method that can only work for passthrough or emulation is not good. > We all know virtualization is passthrough + emulation. Again, I agree but I wont generalize it here. > > > > > > > > > > > > Again, we are talking about different things, I've tried to show > > > > > you that there are cases that passthrough can't work but if you > > > > > think the only way for migration is to use passthrough in every > > > > > case, you will > > > probably fail. > > > > > > > > > I didn't say only way for migration is passthrough. > > > > Passthrough is clearly one way. > > > > Other ways may be possible. > > > > > > > > > > > > > > > > > > Virtio device is not the conduit for this exchange. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > There are works ongoing to make vPASID work for > > > > > > > > > > > > > the guest like > > > > > > > vSVA. > > > > > > > > > > > > > Virtio doesn't differ from other devices. > > > > > > > > > > > > Passthrough do not run like SVA. > > > > > > > > > > > > > > > > > > > > > > Great, you find another limitation of "passthrough" by > yourself. > > > > > > > > > > > > > > > > > > > > > No. it is not the limitation it is just the way it > > > > > > > > > > does not need complex SVA to > > > > > > > > > split the device for unrelated usage. > > > > > > > > > > > > > > > > > > How can you limit the user in the guest to not use vSVA? > > > > > > > > > > > > > > > > > He he, I am not limiting, again misunderstanding or wrong > attribution. > > > > > > > > I explained that hypervisor for passthrough does not need SVA. > > > > > > > > Guest can do anything it wants from the guest OS with the > > > > > > > > member > > > > > device. > > > > > > > > > > > > > > Ok, so the point stills, see above. > > > > > > > > > > > > I donât think so. The guest owns its PASID space > > > > > > > > > > Again, vPASID to PASID can't be done hardware unless I miss some > > > > > recent features of IOMMUs. > > > > > > > > > Cpu vendors have different way of doing vPASID to pPASID. > > > > > > At least for the current version of major IOMMU vendors, such > > > translation (aka PASID remapping) is not implemented in the hardware > > > so it needs to be trapped first. > > > > > Right. So it is really far in future, atleast few years away. > > > > > > It is still an early space for virtio. > > > > > > > > > > and directly communicates like any other device attribute. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Each passthrough device has PASID from its own > > > > > > > > > > > > space fully managed by the > > > > > > > > > > > guest. > > > > > > > > > > > > Some cpu required vPASID and SIOV is not going > > > > > > > > > > > > this way > > > anmore. > > > > > > > > > > > > > > > > > > > > > > Then how to migrate? Invent a full set of something > > > > > > > > > > > else through another giant series like this to > > > > > > > > > > > migrate to the SIOV > > > thing? > > > > > > > > > > > That's a mess for > > > > > > > > > sure. > > > > > > > > > > > > > > > > > > > > > SIOV will for sure reuse most or all parts of this > > > > > > > > > > work, almost entirely > > > > > as_is. > > > > > > > > > > vPASID is cpu/platform specific things not part of the SIOV > devices. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If at all it is done, it will be done from the > > > > > > > > > > > > > > guest by the driver using virtio > > > > > > > > > > > > > interface. > > > > > > > > > > > > > > > > > > > > > > > > > > Then you need to trap. Such things couldn't be > > > > > > > > > > > > > passed through to guests > > > > > > > > > > > directly. > > > > > > > > > > > > > > > > > > > > > > > > > Only PASID capability is trapped. PASID allocation > > > > > > > > > > > > and usage is directly from > > > > > > > > > > > guest. > > > > > > > > > > > > > > > > > > > > > > How can you achieve this? Assigning a PAISD to a > > > > > > > > > > > device is completely > > > > > > > > > > > device(virtio) specific. How can you use a general > > > > > > > > > > > layer without the knowledge of virtio to trap that? > > > > > > > > > > When one wants to map vPASID to pPASID a platform > > > > > > > > > > needs to be > > > > > > > involved. > > > > > > > > > > > > > > > > > > I'm not talking about how to map vPASID to pPASID, it's > > > > > > > > > out of the scope of virtio. I'm talking about assigning > > > > > > > > > a vPASID to a specific virtqueue or other virtio function in the > guest. > > > > > > > > > > > > > > > > > That can be done in the guest. The key is guest wont know > > > > > > > > that it is dealing > > > > > > > with vPASID. > > > > > > > > It will follow the same principle from your paper of > > > > > > > > equivalency, where virtio > > > > > > > software layer will assign PASID to VQ and communicate to device. > > > > > > > > > > > > > > > > Anyway, all of this just digression from current series. > > > > > > > > > > > > > > It's not, as you mention that only MSI-X is trapped, I give > > > > > > > you another > > > one. > > > > > > > > > > > > > PASID access from the guest to be done fully by the guest IOMMU. > > > > > > Not by virtio devices. > > > > > > > > > > > > > > > > > > > > > > > You need a virtio specific queue or capability to assign > > > > > > > > > a PASID to a specific virtqueue, and that can't be done > > > > > > > > > without trapping and without virito specific knowledge. > > > > > > > > > > > > > > > > > I disagree. PASID assignment to a virqueue in future from > > > > > > > > guest virtio driver to > > > > > > > device is uniform method. > > > > > > > > Whether its PF assigning PASID to VQ of self, Or VF driver > > > > > > > > in the guest assigning PASID to VQ. > > > > > > > > > > > > > > > > All same. > > > > > > > > Only IOMMU layer hypercalls will know how to deal with > > > > > > > > PASID assignment at > > > > > > > platform layer to setup the domain etc table. > > > > > > > > > > > > > > > > And this is way beyond our device migration discussion. > > > > > > > > By any means, if you were implying that somehow vq to > > > > > > > > PASID assignment > > > > > > > _may_ need trap+emulation, hence whole device migration to > > > > > > > depend on some > > > > > > > trap+emulation, than surely, than I do not agree to it. > > > > > > > > > > > > > > See above. > > > > > > > > > > > > > Yeah, I disagree to such implying. > > > > > > > > > > > > > > > > > > > > > > PASID equivalent in mlx5 world is ODP_MR+PD isolating the > > > > > > > > guest process and > > > > > > > all of that just works on efficiency and equivalence > > > > > > > principle already for a decade now without any trap+emulation. > > > > > > > > > > > > > > > > > > When virtio passthrough device is in guest, it has all > > > > > > > > > > its PASID > > > > > accessible. > > > > > > > > > > > > > > > > > > > > All these is large deviation from current discussion > > > > > > > > > > of this series, so I will keep > > > > > > > > > it short. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regardless it is not relevant to passthrough mode > > > > > > > > > > > > as PASID is yet another > > > > > > > > > > > resource. > > > > > > > > > > > > And for some cpu if it is trapped, it is generic > > > > > > > > > > > > layer, that does not require virtio > > > > > > > > > > > involvement. > > > > > > > > > > > > So virtio interface asking to trap something > > > > > > > > > > > > because generic facility has done > > > > > > > > > > > in not the approach. > > > > > > > > > > > > > > > > > > > > > > This misses the point of PASID. How to use PASID is > > > > > > > > > > > totally device > > > > > > > specific. > > > > > > > > > > Sure, and how to virtualize vPASID/pPASID is platform > > > > > > > > > > specific as single PASID > > > > > > > > > can be used by multiple devices and process. > > > > > > > > > > > > > > > > > > See above, I think we're talking about different things. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Capabilities of #2 is generic across all pci > > > > > > > > > > > > > > devices, so it will be handled by the > > > > > > > > > > > > > HV. > > > > > > > > > > > > > > ATS/PRI cap is also generic manner handled by > > > > > > > > > > > > > > the HV and PCI > > > > > > > device. > > > > > > > > > > > > > > > > > > > > > > > > > > No, ATS/PRI requires the cooperation from the vIOMMU. > > > > > > > > > > > > > You can simply do ATS/PRI passthrough but with > > > > > > > > > > > > > an emulated > > > > > vIOMMU. > > > > > > > > > > > > And that is not the reason for virtio device to > > > > > > > > > > > > build > > > > > > > > > > > > trap+emulation for > > > > > > > > > > > passthrough member devices. > > > > > > > > > > > > > > > > > > > > > > vIOMMU is emulated by hypervisor with a PRI queue, > > > > > > > > > > PRI requests arrive on the PF for the VF. > > > > > > > > > > > > > > > > > > Shouldn't it arrive at platform IOMMU first? The path > > > > > > > > > should be PRI > > > > > > > > > -> RC -> IOMMU -> host -> Hypervisor -> vIOMMU PRI -> > > > > > > > > > -> guest > > > IOMMU. > > > > > > > > > > > > > > > > > Above sequence seems write. > > > > > > > > > > > > > > > > > And things will be more complicated when (v)PASID is used. > > > > > > > > > So you can't simply let PRI go directly to the guest > > > > > > > > > with the current > > > > > architecture. > > > > > > > > > > > > > > > > > In current architecture of the pci VF, PRI does not go > > > > > > > > directly to the > > > guest. > > > > > > > > (and that is not reason to trap and emulate other things). > > > > > > > > > > > > > > Ok, so beyond MSI-X we need to trap PRI, and we will > > > > > > > probably trap other things in the future like PASID assignment. > > > > > > PRI etc all belong to generic PCI 4K config space region. > > > > > > > > > > It's not about the capability, it's about the whole process of > > > > > PRI request handling. We've agreed that the PRI request needs to > > > > > be trapped by the hypervisor and then delivered to the vIOMMU. > > > > > > > > > > > > > > > Trap+emulation done in generic manner without involving virtio > > > > > > Trap+or other > > > > > device types. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > how can you pass > > > > > > > > > > > through a hardware PRI request to a guest directly > > > > > > > > > > > without trapping it > > > > > > > then? > > > > > > > > > > > What's more, PCIE allows the PRI to be done in a > > > > > > > > > > > vendor > > > > > > > > > > > (virtio) specific way, so you want to break this rule? > > > > > > > > > > > Or you want to blacklist ATS/PRI > > > > > > > > > for virtio? > > > > > > > > > > > > > > > > > > > > > I was aware of only pci-sig way of PRI. > > > > > > > > > > Do you have a reference to the ECN that enables vendor > > > > > > > > > > specific way of PRI? I > > > > > > > > > would like to read it. > > > > > > > > > > > > > > > > > > I mean it doesn't forbid us to build a virtio specific > > > > > > > > > interface for I/O page fault report and recovery. > > > > > > > > > > > > > > > > > So PRI of PCI does not allow. It is ODP kind of technique > > > > > > > > you meant > > > above. > > > > > > > > Yes one can build. > > > > > > > > Ok. unrelated to device migration, so I will park this > > > > > > > > good discussion for > > > > > later. > > > > > > > > > > > > > > That's fine. > > > > > > > > > > > > > > > > > > > > > > > > > This will be very good to eliminate IOMMU PRI limitations. > > > > > > > > > > > > > > > > > > Probably. > > > > > > > > > > > > > > > > > > > PRI will directly go to the guest driver, and guest > > > > > > > > > > would interact with IOMMU > > > > > > > > > to service the paging request through IOMMU APIs. > > > > > > > > > > > > > > > > > > With PASID, it can't go directly. > > > > > > > > > > > > > > > > > When the request consist of PASID in it, it can. > > > > > > > > But again these PCI-SIG extensions of PASID are not > > > > > > > > related to device > > > > > > > migration, so I am differing it. > > > > > > > > > > > > > > > > > > For PRI in vendor specific way needs a separate > > > > > > > > > > discussion. It is not related to > > > > > > > > > live migration. > > > > > > > > > > > > > > > > > > PRI itself is not related. But the point is, you can't > > > > > > > > > simply pass through ATS/PRI now. > > > > > > > > > > > > > > > > > Ah ok. the whole 4K PCI config space where ATS/PRI > > > > > > > > capabilities are located > > > > > > > are trapped+emulated by hypervisor. > > > > > > > > So? > > > > > > > > So do we start emulating virito interfaces too for passthrough? > > > > > > > > No. > > > > > > > > Can one still continue to trap+emulate? > > > > > > > > Sure why not? > > > > > > > > > > > > > > Then let's not limit your proposal to be used by "passthrough" only? > > > > > > One can possibly build some variant of the existing virtio > > > > > > member device > > > > > using same owner and member scheme. > > > > > > > > > > It's not about the member/owner, it's about e.g whether the > > > > > hypervisor can trap and emulate. > > > > > > > > > > I've pointed out that what you invent here is actually a partial > > > > > new transport, for example, a hypervisor can trap and use things > > > > > like device context in PF to bypass the registers in VF. This is > > > > > the idea of > > > transport commands/q. > > > > > > > > > I will not mix transport commands which are mainly useful for > > > > actual device > > > operation for SIOV only for backward compatibility that too optionally. > > > > One may still choose to have virtio common and device config in > > > > MMIO > > > ofcourse at lower scale. > > > > > > > > Anyway, mixing migration context with actual SIOV specific thing > > > > is not correct > > > as device context is read/write incremental values. > > > > > > SIOV is transport level stuff, the transport virtqueue is designed > > > in a way that is general enough to cover it. Let's not shift concepts. > > > > > Such TVQ is only for backward compatible vPCI composition. > > For ground up work such TVQ must not be done through the owner device. > > That's the idea actually. > > > Each SIOV device to have its own channel to communicate directly to the > device. > > > > > One thing that you ignore is that, hypervisor can use what you > > > invented as a transport for VF, no? > > > > > No. by design, > > It works like hypervisor traps the virito config and forwards it to admin > virtqueue and starts the device via device context. It needs more granular support than the management framework of device context. > > > it is not good idea to overload management commands with actual run time > guest commands. > > The device context read writes are largely for incremental updates. > > It doesn't matter if it is incremental or not but > It does because you want different functionality only for purpose of backward compatibility. That also if the device does not offer them as portion of MMIO BAR. > 1) the function is there > 2) hypervisor can use that function if they want and virtio (spec) can't forbid > that > It is not about forbidding or supporting. Its about what functionality to use for management plane and guest plane. Both have different needs. > > > > For VF driver it has own direct channel via its own BAR to talk to the device. > So no need to transport via PF. > > For SIOV for backward compat vPCI composition, it may be needed. > > Hard to say, if that can be memory mapped as well on the BAR of the PF. > > We have seen one device supporting it outside of the virtio. > > For scale anyway, one needs to use the device own cvq for complex > configuration. > > That's the idea but I meant your current proposal overlaps those functions. > Not really. One can have simple virtio config space access read/write functionality, in addition to what is done here. And that is still fine. One is doing proxying for guest. Management plane is doing more than just register proxy. > > > > > > > > > > > > If for that is some admin commands are missing, may be one can > > > > > > add > > > them. > > > > > > > > > > I would then build the device context commands on top of the > > > > > transport commands/q, then it would be complete. > > > > > > > > > > > No need to step on toes of use cases as they are different... > > > > > > > > > > > > > I've shown you that > > > > > > > > > > > > > > 1) you can't easily say you can pass through all the virtio > > > > > > > facilities > > > > > > > 2) how ambiguous for terminology like "passthrough" > > > > > > > > > > > > > It is not, it is well defined in v3, v2. > > > > > > One can continue to argue and keep defining the variant and > > > > > > still call it data > > > > > path acceleration and then claim it as passthrough ... > > > > > > But I won't debate this anymore as its just non-technical > > > > > > aspects of least > > > > > interest. > > > > > > > > > > You use this terminology in the spec which is all about > > > > > technical, and you think how to define it is a matter of > > > > > non-technical. This is self-contradictory. If you fail, it probably means it's > ambiguous. > > > > > Let's don't use that terminology. > > > > > > > > > What it means is described in theory of operation. > > > > > > > > > > We have technical tasks and more improved specs to update > > > > > > going > > > forward. > > > > > > > > > > It's a burden to do the synchronization. > > > > We have discussed this. > > > > In current proposed the member device is not bifurcated, > > > > > > It is. Part of the functions were carried via the PCI interface, > > > some are carried via owner. You end up with two drivers to drive the > devices. > > > > > Nop. > > All admin work of device migration is carried out via the owner device. > > All guest triggered work is carried out using VF itself. > > Guests don't (or can't) care about how the hypervisor is structured. For passthrough mode, it just cannot be structured inside the VF. > So we're discussing the view of device, member devices needs to server for > > 1) request from the transport (it's guest in your context) > 2) request from the owner Doing #2 of the owner on the member device functionality do not work when hypervisor do not have access to the member device.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]