Subject: Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration

How do you transfer the ownership?
An additional ownership deletgation by a new admin
if you think this can work, do you want to cook a patch
to implement this before you submitting this live migration
I answered this already above.
talk is cheap, show me your patch
Huh. We presented the infrastructure that migrates, 30+
device types,
covering device context ideas from Oracle.
Covering P2P, supporting device_reset, FLR, dirty page tracking.

Please have some respect for other members who covered more
ground than
your series.
What more? Apply the same nested concept on the member
device as
Michael suggested, it is nested virtualization maintain exact
So a VF is mapped as PF to the L1 guest.
L1 guest can enable SR-IOV on it, and map one VF to L2 guest.

This nested work can be extended in future, once first level
nesting is
Answer all questions above, if you think a management VF
can work, please show me your patch.
The idea evolves from technical debate then pointing fingers
like your
I think a positive discussion with Michael and a pointer to
the paper from
Jason gave a good direction of doing _right_ nesting that
follows two
a. efficiency property
b. equivalence property

(c. resource control is natural already)

Both apply at VMM and at VM level enabling recursive
virtualization, by
having VF that can act as PF inside the guest.
[1] https://dl.acm.org/doi/pdf/10.1145/361011.361073
Please just show me your patch resolving these opens, how
about start from defining virito-fs device context and your
management VF?
As answered, device context infrastructure is done, per device
specific device-
context will be defined incrementally.
I will not be including virtio-fs in this series. It will be
done incrementally in
future utilizing the infrastructure build in this series.
Done? How do you conclude this? You just tell me what is the
full set of virito-fs device context now and how to migrate them.

You cant? you refuse or you don't? Do you expect the HW
designer to figure out by themself?
I wont be able to tell now as I donât think it is necessary for this series.
If one out of 30 devices cannot migrate because of unimaginable
amount of
complexity has been placed there, may be one will not implement
it as member device.
     From experience of migratable complex gpu devices, rdma
devices (stateful
having hundred thousand of stateful QPs), my understanding is
complex state of virtio-fs can be defined and migratable.
Mlx5 driver consist of 150,000 lines of code and that device is
with complex state.
So I am optimistic that virtio-fs can be migratable too.
It does not have to limited by my limited creativity of 2023.
May be I am wrong, in that case one will not implement
passthrough virtio-fs
your series wants to migrate device context, but doesn't define
device context, does this sounds reasonable?
Device generic context is defined at [1] and also the
infrastructure for defining
the device context in parallel by multiple people can be done post
the work of [1].
Per each device type context will be defined incrementally post this
This is not post of the work, you should define them before you use
them in this series.

I donât agree to cook ocean in this patch series.
No practical spec devel community does it.
As long as we feel comfortable that device context framework is
extendible, it
is fine.
If virtio-fs seems very hard, may be one will come with a new light
weight FS
device. I really donât know.
so you want to migrate device context, but refuse to define them?
And you need to prove why admin vq are better than registers
solution if you want a merge.
Michael already responded the practical aspects.
Since you may claim, I didnât answer, below is the technical details.

Why admin commands and aq is better is because of below reasons in
Functionally better:
1. When the live migration registers are located on the VF itself,
VMM does
not have control of it.
These registers reset, on FLR and device reset because these are
registers of the device.
Hence, VMM lost the state for the job that VMM was supposed to do.
Therefore, passthrough mode cannot depend on these registers.

2. Any bulk data transfer of device context and dirty page tracking
Hence those DMA must happen to the device which is different than VF
If it is on the VF itself, it has two problems.
2.a. VF device reset and FLR will clear them, and device context is lost.

2.b. the DMA occurs at the PCI RID level.
IOMMU cannot bifurcate the DMA of one RID to two different address
of guest and hypervisor.
This requires PASID support.
Using PASID has following problems.
2.b.1 PASID typically not used by the kernel software. It is only
meant for the
user processes.
Hence for kernel work a reserving PASID won't be acceptable upstream
2.b.2 Somehow if this is done, When the VF itself supports PASID, it
now vPASID support.
This is again not where industry is going in other forums where I am part of.
Hence, it will be failure for virtio. Hence, I do not recommend vPASID route.
2.b.3 One of the widely used cpu seems to have dropped the support
due to
limitation of an instruction around PASID.
So it cannot be used there, this further limits virtio passthrough users.

Even if somehow 2.b.2 and 2.b.3 is overcome in theory, #1 and 2.a is
functional problems.
Scale wise better:
3. Admin command and admin vq are used _only_ when one does device
migration command.
One does not migrate VMs every few msec.
Hence such functionality to be better be done which is efficient for
performance, but without consuming on-chip memory.
Admin command and admin vq satisfy those.

4. Once the software matures further, admin command would prefer
completion interrupt, instead of poll.
How to get notification/interrupt? Well, virtqueue defines this already.
Should we replicate that in some PF registers?
It can be. But once you put all the functionalities of admin command
and aq
in registers the whole thing becomes yet another register_q.
5. Can these registers be placed in the PF to overcome #1 and #2 for
In theory yes.
In practice, no, as there are many commands that flow, which needs
to scale
to reasonable number of VFs.
Admin commands over admin vq provides this generic facility.

6. Most modern devices who attempts to scale, cut down their
footprint, registers are used only for main bootstap, init time config work.
Even in virtio spec, one can read:
"Device configuration space is generally used for rarely changing or
initialization-time parameters."
Adding some additional registers to a PF device config space for non
init time
parameters does not make sense.
7. Additionally, a nested virtualization should be done by truly
nesting the
device at right abstraction point of owner-member relationship.
This follows two principles of (a) efficiency and (b) equivalency of
what Jason
paper pointed.
And we ask for nested VF extension we will get our guidance from
why it should be done if it is matching with rest of the ecosystem
components that support/donât support the nesting.
It they are true, shall we refactor virtio-pci common cfg
functionalities to use admin vq?
For non-backward compatible SIOV device of the future, yes, virtio-pci
common config (non init registers) should be moved to a vq, located on the
member device directly.
Oh, really? Quite interesting, do you want to move all config space fields in VF
to admin vq? Have a plan?
Not in my plan for spec 1.4 time frame.
I do not want to divert the discussion, would like to focus on device migration phases.
Lets please discuss in some other dedicated thread.
OK, but don't say admin vq is better than registers.

