virtio-comment message

Subject: Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration

From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
To: Parav Pandit <parav@nvidia.com>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>
Date: Thu, 19 Oct 2023 17:16:31 +0800



On 10/19/2023 5:13 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Thursday, October 19, 2023 2:40 PM

On 10/19/2023 5:01 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Thursday, October 19, 2023 1:45 PM

On 10/18/2023 5:48 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, October 18, 2023 2:13 PM

On 10/18/2023 3:20 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, October 18, 2023 12:22 PM

On 10/18/2023 2:41 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, October 18, 2023 12:06 PM

On 10/18/2023 1:02 PM, Parav Pandit wrote:

From: virtio-comment@lists.oasis-open.org
<virtio-comment@lists.oasis- open.org> On Behalf Of Zhu,
Lingshan
Sent: Monday, October 16, 2023 3:18 PM

On 10/13/2023 7:54 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Friday, October 13, 2023 3:14 PM

How do you transfer the ownership?

An additional ownership deletgation by a new admin

command.

if you think this can work, do you want to cook a patch
to implement this before you submitting this live migration

series?

I answered this already above.

talk is cheap, show me your patch

Huh. We presented the infrastructure that migrates, 30+
device types,

covering device context ideas from Oracle.

Covering P2P, supporting device_reset, FLR, dirty page tracking.

Please have some respect for other members who covered more
ground than

your series.

What more? Apply the same nested concept on the member
device as

Michael suggested, it is nested virtualization maintain exact
same

semantics.

So a VF is mapped as PF to the L1 guest.
L1 guest can enable SR-IOV on it, and map one VF to L2 guest.

This nested work can be extended in future, once first level
nesting is

covered.

Answer all questions above, if you think a management VF
can work, please show me your patch.

The idea evolves from technical debate then pointing fingers
like your

comment.

I think a positive discussion with Michael and a pointer to
the paper from

Jason gave a good direction of doing _right_ nesting that
follows two

principles.

a. efficiency property
b. equivalence property

(c. resource control is natural already)

Both apply at VMM and at VM level enabling recursive
virtualization, by

having VF that can act as PF inside the guest.

[1] https://dl.acm.org/doi/pdf/10.1145/361011.361073

Please just show me your patch resolving these opens, how
about start from defining virito-fs device context and your

management VF?

As answered, device context infrastructure is done, per device
specific device-

context will be defined incrementally.

I will not be including virtio-fs in this series. It will be
done incrementally in

future utilizing the infrastructure build in this series.
Done? How do you conclude this? You just tell me what is the
full set of virito-fs device context now and how to migrate them.

You cant? you refuse or you don't? Do you expect the HW
designer to figure out by themself?

I wont be able to tell now as I donât think it is necessary for this series.
If one out of 30 devices cannot migrate because of unimaginable
amount of

complexity has been placed there, may be one will not implement
it as member device.

     From experience of migratable complex gpu devices, rdma
devices (stateful

having hundred thousand of stateful QPs), my understanding is
complex state of virtio-fs can be defined and migratable.

Mlx5 driver consist of 150,000 lines of code and that device is
migratable

with complex state.

So I am optimistic that virtio-fs can be migratable too.
It does not have to limited by my limited creativity of 2023.
May be I am wrong, in that case one will not implement
passthrough virtio-fs

device.
your series wants to migrate device context, but doesn't define
device context, does this sounds reasonable?

Device generic context is defined at [1] and also the
infrastructure for defining

the device context in parallel by multiple people can be done post
the work of [1].

Per each device type context will be defined incrementally post this

work.

[1]
https://lists.oasis-open.org/archives/virtio-comment/202310/msg001
90
.h
tml

This is not post of the work, you should define them before you use
them in this series.

I donât agree to cook ocean in this patch series.
No practical spec devel community does it.
As long as we feel comfortable that device context framework is
extendible, it

is fine.

If virtio-fs seems very hard, may be one will come with a new light
weight FS

device. I really donât know.
so you want to migrate device context, but refuse to define them?

And you need to prove why admin vq are better than registers
solution if you want a merge.

Michael already responded the practical aspects.
Since you may claim, I didnât answer, below is the technical details.

Why admin commands and aq is better is because of below reasons in
my

view:

Functionally better:
1. When the live migration registers are located on the VF itself,
VMM does

not have control of it.

These registers reset, on FLR and device reset because these are
virtio

registers of the device.

Hence, VMM lost the state for the job that VMM was supposed to do.
Therefore, passthrough mode cannot depend on these registers.

2. Any bulk data transfer of device context and dirty page tracking
requires

DMA.

Hence those DMA must happen to the device which is different than VF

itself.

If it is on the VF itself, it has two problems.
2.a. VF device reset and FLR will clear them, and device context is lost.

2.b. the DMA occurs at the PCI RID level.
IOMMU cannot bifurcate the DMA of one RID to two different address
space

of guest and hypervisor.

This requires PASID support.
Using PASID has following problems.
2.b.1 PASID typically not used by the kernel software. It is only
meant for the

user processes.

Hence for kernel work a reserving PASID won't be acceptable upstream

kernel.

2.b.2 Somehow if this is done, When the VF itself supports PASID, it
required

now vPASID support.

This is again not where industry is going in other forums where I am part of.

Hence, it will be failure for virtio. Hence, I do not recommend vPASID route.

2.b.3 One of the widely used cpu seems to have dropped the support
due to

limitation of an instruction around PASID.

So it cannot be used there, this further limits virtio passthrough users.

Even if somehow 2.b.2 and 2.b.3 is overcome in theory, #1 and 2.a is

functional problems.

Scale wise better:
3. Admin command and admin vq are used _only_ when one does device

migration command.

One does not migrate VMs every few msec.
Hence such functionality to be better be done which is efficient for

performance, but without consuming on-chip memory.

Admin command and admin vq satisfy those.

4. Once the software matures further, admin command would prefer

completion interrupt, instead of poll.

How to get notification/interrupt? Well, virtqueue defines this already.
Should we replicate that in some PF registers?
It can be. But once you put all the functionalities of admin command
and aq

in registers the whole thing becomes yet another register_q.

5. Can these registers be placed in the PF to overcome #1 and #2 for

passthrough?

In theory yes.
In practice, no, as there are many commands that flow, which needs
to scale

to reasonable number of VFs.

Admin commands over admin vq provides this generic facility.

6. Most modern devices who attempts to scale, cut down their
register

footprint, registers are used only for main bootstap, init time config work.

Even in virtio spec, one can read:
"Device configuration space is generally used for rarely changing or

initialization-time parameters."

Adding some additional registers to a PF device config space for non
init time

parameters does not make sense.

7. Additionally, a nested virtualization should be done by truly
nesting the

device at right abstraction point of owner-member relationship.

This follows two principles of (a) efficiency and (b) equivalency of
what Jason

paper pointed.

And we ask for nested VF extension we will get our guidance from
PCI-SIG, of

why it should be done if it is matching with rest of the ecosystem
components that support/donât support the nesting.
It they are true, shall we refactor virtio-pci common cfg
functionalities to use admin vq?

For non-backward compatible SIOV device of the future, yes, virtio-pci

common config (non init registers) should be moved to a vq, located on the
member device directly.
Oh, really? Quite interesting, do you want to move all config space fields in VF
to admin vq? Have a plan?

Not in my plan for spec 1.4 time frame.
I do not want to divert the discussion, would like to focus on device migration phases.
Lets please discuss in some other dedicated thread.

OK, but don't say admin vq is better than registers.

References:
- [PATCH v1 0/8] Introduce device migration support commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>
- Re: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
- RE: [virtio-comment] Re: [PATCH v1 3/8] device-context: Define the device context fields for device migration
  - From: Parav Pandit <parav@nvidia.com>