OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE




On 9/11/2023 3:30 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, September 11, 2023 12:48 PM

On 9/11/2023 3:07 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, September 11, 2023 12:28 PM
I donât see in his proposal how all the features and functionality
supported is
achieved.
I will include in-flight descriptor tracker and diry-page traking in
V2, anything else missed?
It can migrate the device itself, why don't you think so, can you
name some issues we can work on for improvements?
I would like to see a proposal similar to [1] that can work without mediation
in case if you want to combine two use cases under one.
Else, I donât see a need to merge two things.

Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered
in [1] for passthrough cases.
We are introducing basic facilities, feel free to re-use them in the admin vq
solution.
Basic facilities are added in [1] for passthrough devices.
You can leverage them in your v2 for supporting p2p devices, dirty page tracking, passthrough support, shorter downtime and more.
Basic facilities should be better not depend on others, but admin vq can re-use the basic facilities.

For P2P, what if the devices are placed in different IOMMU group?

If you want to implement LM by admin vq, the facilities in my series
can be re- used. E.g., forward your suspend to SUSPEND bit.
Just VQ suspend is not enough...
In this series, it contains: device SUSPEND, queue state accessor.
MST required in-flight descriptor tracking, which will be included in next
version.
For passthrough more than that is needed.
Dirty page tracking will be addressed too, others should we work on?
Admin queue of the member device is migrated like any other queue
using
above [1].
2) won't work in the nested environment, or we need complicated
SR-IOV emulation in order to work

Poking at the device from the driver to migrate it is not going
to work if the driver lives within guest.
This is by design to allow live migration to work in the nested layer.
And it's the way we've used for CPU and MMU. Anything may virtio
different here?
Nested and non-nested use cases likely cannot be addressed by
single
solution/interface.

I think Ling Shan's proposal addressed them both.

I donât see how all above points are covered.
Why?


And how do you migrate nested VMs by admin vq?

Hypervisor = level 1.
VM = level 2.
Nested VM = level 3.
VM of level 2 to take care of migrating level 3 composed device using its sw
composition or may be using some kind of mediation that you proposed.
So, nested VM is not aware of the admin vq or does not have access to admin
vq, right?
Right. It is not aware.

How many admin vqs and the bandwidth are reserved for migrate all VMs?

It does not matter because number of AQs is configurable that device and
driver can decide to use.
I am not sure which BW are talking about.
There are many BW in place that one can regulate, at network level, pci level,
VM level etc.
It matters because of QOS and the downtime must converge.
QOS is such a broad term that is hard to debate unless you get to a specific point.
E.g., there can be hundreds or thousands of VMs, how many admin vq are required to serve them when
LM? To converge, no timeout.
E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the number
in HW implementation and how does the driver get informed?
Usually just one AQ is enough as proposal [1] is built around inherent downtime reduction.
You can ask similar question for RSS, how does hw device how many RSS queues are needed. ð
Device exposes number of supported AQs that driver is free to use.
RSS is not a must for the transition through maybe performance overhead.
But if the host can not finish Live Migration in the due time, then it is
a failed LM.

Most sane sys admins do not migrate 1000 VMs at same time for obvious reasons.
But when such requirements arise, a device may support it.
Just like how a net device can support from 1 to 32K txqueues at spec level.
The orchestration layer may do that for host upgrade or power-saving.
And the VMs may be required to migrate together, for example:
a cluster of VMs in the same subnet.

Lets do not introduce new frangibility

Remember CSP migrates all VMs on a host for powersaving or upgrade.
I am not sure why the migration reason has any influence on the design.
Because this design is for live migration.
The CSPs that we had discussed, care for performance more and hence
prefers passthrough instead or mediation and donât seem to be doing any
nesting.
CPU doesnt have support for 3 level of page table nesting either.
I agree that there could be other users who care for nested functionality.

Any ways, nesting and non-nesting are two different requirements.
The LM facility should server both,
I donât see how PCI spec let you do it.
PCI community already handed over this to SR-PCIM interface outside of the PCI spec domain.
Hence, its done over admin queue for passthrough devices.

If you can explain, how your proposal addresses passthrough support without mediation and also does DMA, I am very interested to learn that.
Do you mean nested? Why this series can not support nested?

And it does not serve bare-metal live migration either.
A bare-metal migration seems a distance theory as one need side cpu, memory accessor apart from device accessor.
But somehow if that exists, there will be similar admin device to migrate it may be TDDISP will own this whole piece one day.
Bare metal live migration require other components like firmware OS and partitioning, that's why the device live migration should not
be a blocker.





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]