OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration

On 10/19/2023 5:01 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Thursday, October 19, 2023 1:45 PM

On 10/18/2023 5:48 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, October 18, 2023 2:13 PM

On 10/18/2023 3:20 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, October 18, 2023 12:22 PM

On 10/18/2023 2:41 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, October 18, 2023 12:06 PM

On 10/18/2023 1:02 PM, Parav Pandit wrote:
From: virtio-comment@lists.oasis-open.org
<virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
Sent: Monday, October 16, 2023 3:18 PM

On 10/13/2023 7:54 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Friday, October 13, 2023 3:14 PM
How do you transfer the ownership?
An additional ownership deletgation by a new admin
if you think this can work, do you want to cook a patch to
implement this before you submitting this live migration series?
I answered this already above.
talk is cheap, show me your patch
Huh. We presented the infrastructure that migrates, 30+ device
covering device context ideas from Oracle.
Covering P2P, supporting device_reset, FLR, dirty page tracking.

Please have some respect for other members who covered more
ground than
your series.
What more? Apply the same nested concept on the member device
Michael suggested, it is nested virtualization maintain exact
So a VF is mapped as PF to the L1 guest.
L1 guest can enable SR-IOV on it, and map one VF to L2 guest.

This nested work can be extended in future, once first level
nesting is
Answer all questions above, if you think a management VF can
work, please show me your patch.
The idea evolves from technical debate then pointing fingers
like your
I think a positive discussion with Michael and a pointer to
the paper from
Jason gave a good direction of doing _right_ nesting that
follows two
a. efficiency property
b. equivalence property

(c. resource control is natural already)

Both apply at VMM and at VM level enabling recursive
virtualization, by
having VF that can act as PF inside the guest.
[1] https://dl.acm.org/doi/pdf/10.1145/361011.361073
Please just show me your patch resolving these opens, how about
start from defining virito-fs device context and your management VF?
As answered, device context infrastructure is done, per device
specific device-
context will be defined incrementally.
I will not be including virtio-fs in this series. It will be
done incrementally in
future utilizing the infrastructure build in this series.
Done? How do you conclude this? You just tell me what is the full
set of virito-fs device context now and how to migrate them.

You cant? you refuse or you don't? Do you expect the HW designer
to figure out by themself?
I wont be able to tell now as I donât think it is necessary for this series.
If one out of 30 devices cannot migrate because of unimaginable
amount of
complexity has been placed there, may be one will not implement it
as member device.
    From experience of migratable complex gpu devices, rdma devices
having hundred thousand of stateful QPs), my understanding is
complex state of virtio-fs can be defined and migratable.
Mlx5 driver consist of 150,000 lines of code and that device is
with complex state.
So I am optimistic that virtio-fs can be migratable too.
It does not have to limited by my limited creativity of 2023.
May be I am wrong, in that case one will not implement passthrough
your series wants to migrate device context, but doesn't define
device context, does this sounds reasonable?
Device generic context is defined at [1] and also the infrastructure
for defining
the device context in parallel by multiple people can be done post
the work of [1].
Per each device type context will be defined incrementally post this work.

This is not post of the work, you should define them before you use
them in this series.

I donât agree to cook ocean in this patch series.
No practical spec devel community does it.
As long as we feel comfortable that device context framework is extendible, it
is fine.
If virtio-fs seems very hard, may be one will come with a new light weight FS
device. I really donât know.
so you want to migrate device context, but refuse to define them?
And you need to prove why admin vq are better than registers solution
if you want a merge.
Michael already responded the practical aspects.
Since you may claim, I didnât answer, below is the technical details.

Why admin commands and aq is better is because of below reasons in my
Functionally better:
1. When the live migration registers are located on the VF itself, VMM does
not have control of it.
These registers reset, on FLR and device reset because these are virtio
registers of the device.
Hence, VMM lost the state for the job that VMM was supposed to do.
Therefore, passthrough mode cannot depend on these registers.

2. Any bulk data transfer of device context and dirty page tracking requires
Hence those DMA must happen to the device which is different than VF itself.
If it is on the VF itself, it has two problems.
2.a. VF device reset and FLR will clear them, and device context is lost.

2.b. the DMA occurs at the PCI RID level.
IOMMU cannot bifurcate the DMA of one RID to two different address space
of guest and hypervisor.
This requires PASID support.
Using PASID has following problems.
2.b.1 PASID typically not used by the kernel software. It is only meant for the
user processes.
Hence for kernel work a reserving PASID won't be acceptable upstream
2.b.2 Somehow if this is done, When the VF itself supports PASID, it required
now vPASID support.
This is again not where industry is going in other forums where I am part of.
Hence, it will be failure for virtio. Hence, I do not recommend vPASID route.
2.b.3 One of the widely used cpu seems to have dropped the support due to
limitation of an instruction around PASID.
So it cannot be used there, this further limits virtio passthrough users.

Even if somehow 2.b.2 and 2.b.3 is overcome in theory, #1 and 2.a is
functional problems.
Scale wise better:
3. Admin command and admin vq are used _only_ when one does device
migration command.
One does not migrate VMs every few msec.
Hence such functionality to be better be done which is efficient for
performance, but without consuming on-chip memory.
Admin command and admin vq satisfy those.

4. Once the software matures further, admin command would prefer
completion interrupt, instead of poll.
How to get notification/interrupt? Well, virtqueue defines this already.
Should we replicate that in some PF registers?
It can be. But once you put all the functionalities of admin command and aq
in registers the whole thing becomes yet another register_q.
5. Can these registers be placed in the PF to overcome #1 and #2 for
In theory yes.
In practice, no, as there are many commands that flow, which needs to scale
to reasonable number of VFs.
Admin commands over admin vq provides this generic facility.

6. Most modern devices who attempts to scale, cut down their register
footprint, registers are used only for main bootstap, init time config work.
Even in virtio spec, one can read:
"Device configuration space is generally used for rarely changing or
initialization-time parameters."
Adding some additional registers to a PF device config space for non init time
parameters does not make sense.
7. Additionally, a nested virtualization should be done by truly nesting the
device at right abstraction point of owner-member relationship.
This follows two principles of (a) efficiency and (b) equivalency of what Jason
paper pointed.
And we ask for nested VF extension we will get our guidance from PCI-SIG, of
why it should be done if it is matching with rest of the ecosystem components
that support/donât support the nesting.
It they are true, shall we refactor virtio-pci common cfg functionalities to use
admin vq?
For non-backward compatible SIOV device of the future, yes, virtio-pci common config (non init registers) should be moved to a vq, located on the member device directly.
Oh, really? Quite interesting, do you want to move all config space fields in VF to admin vq? Have a plan?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]