OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 20, 2023 7:46 PM
> 
> > Details of his position in my view:
> >
> > 1. Device migration must be done through VF itself by suspending specific vqs
> and the VF device both.
> > 2. When device migration is done using #1, it must be done using mediation
> approach in hypervisor.
> >
> > 3. When migration is done using inband mediation it is more secure than AQ
> approach.
> > (as opposed to AQ of the owner device who enables/disables SR-IOV).
> >
> > 4. AQ is not secure.
> > But,
> > 5. AQ and admin commands can be built on top of his proposal #1, even if AQ
> is less secure. Opposing statements...
> >
> > 6. Dirty page tracking and inflight descriptors tracking to be done in his v1. but
> he does not want to review such coverage in [1].
> >
> > 8. Since his series does not cover any device context migration and
> > does not talk anything about it, I deduce that he plans to use cvq for setting
> ups RSS and other fields using inband CVQ of the VF.
> > This further limit the solution to only net device, ignoring rest of the other
> 20+ device types, where all may not have the CVQ.
> >
> > 9. trapping and emulation of following objects: AQ, CVQ, virtio config space,
> PCI FLR flow in hypervisor is secure, but when if AQ of the PF do far small work
> of it, AQ is not secure.
> >
> > 10. Any traps proposed in #9 mostly do not work with future TDISP as TDISP do
> not bifurcate the device, so ignore them for now to promote inband migration.
> >
> > 11. He do not show interest in collaboration (even after requesting few times)
> to see if we can produce common commands that may work for both
> passthrough (without mediation) and using mediation for nested case.
> >
> > 12. Some how register access on single physical card for the PFs and VFs gives
> better QoS guarantee than virtqueue as registers can scale infinitely no matter
> how many VFs or for multiple VQs because it is per VF.
> >
> > [1]
> > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > vidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
> 
> 
> OK so with this summary in mind, can you find any advantages to
> inband+mediation that are real or do you just see disadvantages? And
> it's a tricky question because I can see some advantages ;)

inband + mediation may be useful for nested case.

In attempting inband + mediation, there are many critical pieces are let go.
It may be fine to let go for some cases but not for passthrough.

The fundamental advantages of owner-based approach I see are:
1. Nesting use case usually involve large number of VMs to be hosted in one VM
For this purpose, better to hand over a PF to level 0 VM that hosts VFs and level 1 VMs and avoid two level device nesting.

2. Support P2P natively

3. Single non replicated resource (AQ) manages less frequent work of device migration
No need to replicate AQs to thousands of VFs who rarely do the migration work.
Overall gains system, device, and memory efficiency.

5. Passthrough simply do not work at all ever without owner device.
This is because dirty page tracking, device context management, CVQ, FLR, config space, device status, MSIX config all must be trapped.
Many systems do not prefer this involvement of hypervisor even if the hypervisor is trusted. (to avoid moving parts).
New generation TEE, TPM devices are on horizon, and they would not like things not trapped either.
The security audit surface is very large for them.

6. Any new basic functionality added to device must always also require constant software updates at few layers in mediation entities

7. TDISP is inherently covered. Without owner device TDISP is broken as device cannot be bifurcated.

To me #2, #5, #7 are critical piece that a device migration must support/work with.
Rest is second level of importance.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]