OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] RE: [PATCH V2 2/6] virtio: introduce SUSPEND bit in device status




On 11/6/2023 6:52 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 6, 2023 2:51 PM

On 11/6/2023 12:07 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, November 6, 2023 9:00 AM

On 11/3/2023 11:54 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Friday, November 3, 2023 8:25 PM

On 11/3/2023 7:35 PM, Parav Pandit wrote:
From: Zhu Lingshan <lingshan.zhu@intel.com>
Sent: Friday, November 3, 2023 4:05 PM

This patch introduces a new status bit in the device status: SUSPEND.

This SUSPEND bit can be used by the driver to suspend a device,
in order to stabilize the device states and virtqueue states.

Its main use case is live migration.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
You constantly complained that whatever was proposed using admin
commands method in [1] must work for passthrough and non-passthrough.
And halfway in the discussion you propose a method after learning
all the
limitations of in-band, you propose a solution only works for
non-passthrough mode.
You asked someone to have comprehensive proposal and when it comes
to
you following it, you just donât.
not sure what you are talking about.
And have most shallow commit message to not even mention it.

Please be consistent in design approach.
And if you donât want to be, stop asking others.
this SUSPEND/RESUME doesn't change since the RFC series, how can it
not be inconsistent???
This is not the way TC collaboration works.
I probably shouldnât even expect this from you.
Your proposal does not cover both the use cases of passthrough and
non-
passthrough.
Yet you kept demanding them for others.
This is just wrong.

I am aware that both models as technical pros and cons.
Why this doesn't work? the device status byte has been working for
many years, and do you know when guest freeze, the hypervisor owns the
device????
When the guest is not frozen and during the pre-copy phase, hypervisor needs
to access the device (context, dirty pages).
How does it work if the guest owns the device?
Have you seen PASID there?
PASID does not help because as explained virtio common config space and device specific config space is owned by the guest driver.

Secondly PASID space is also owned by the guest driver.
hypervisor sets a PASID to isolate the cap.

[1]
https://lists.oasis-open.org/archives/virtio-comment/202310/msg004
72
.h
tml
Please don't be so emotional and please be professional.

Why this solution can not work for pass-through? Do you know the
device ownership will be transferred to the hypervisor when guest
suspended in live migration?
I explained 5 reasons why it does not work in previous reply.

As the word indicates "live migration", the hypervisor needs to
access the
device when it is "live" (not just after).
Hence, passthrough mode must be able to capture the state of the
device and
dirty pages database when its live.
(and after the source is suspended).
No, the hypervisor should only collect dirty pages when the device alive.
It is needed during both the times.
When the device and guest is live during pre-copy phase.
And after the device is frozen, to get the final round of pages.
With PASID, dirty page tracking facility can be isolated from the guest, means
the hypervisor owns this facility. So the hypervisor can collect the dirty pages.

When the device suspended, it should report the last round of dirty pages
through dirty page tracking facility as expected.

This can work, right?
Unfortunately no, as non atomic bitmap cannot reside in the host memory,
as explained before, PCI and CPU supports atomic read/write. Please refer to PCI spec and CPU ISA.
And whatever is in the device gets reset on device reset and/or FLR. So the dirty map detail is lost.
Similarly the device context is also lost on these two events triggered by guest.
we explained before, when reset, the device should clear everything.

As you can see, the dirty page tracking facility has a PASID for
isolation. But still, the question is, we should better use platform
dirty page tracking

Nothing to do with PASID, as PASID is owned by the guest.
It looks you don't know how PASID work.
Host can setup PASID to isolate some facilities, right?
There are few limitations with PASID.
a. All platforms do not have PASID and
As we have explained for many times, this is a basic facility,
and the implementation is transport-specific.

We given an example of PCI implementation, and PCI support PASID, right?
b. I explained above PASID do not work always as PASID only bifurcates DMA not the device _functionality_.
With a PASID, a cap can be considered to be placed in another logical address space, which is not accessible to the guest.
c. PASID to be available to guest as_is what is present on the device
host hypervisor sets the PASID, transparent to the guest.

Then suspend the device after guest freeze, to stabilize the device
status, then read the status.

How can you say this does not work???
I explained above.
see above



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]