OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [virtio-comment] RE: [PATCH V2 2/6] virtio: introduce SUSPEND bit in device status


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, November 6, 2023 2:51 PM
> 
> On 11/6/2023 12:07 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, November 6, 2023 9:00 AM
> >>
> >> On 11/3/2023 11:54 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Friday, November 3, 2023 8:25 PM
> >>>>
> >>>> On 11/3/2023 7:35 PM, Parav Pandit wrote:
> >>>>>> From: Zhu Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Friday, November 3, 2023 4:05 PM
> >>>>>>
> >>>>>> This patch introduces a new status bit in the device status: SUSPEND.
> >>>>>>
> >>>>>> This SUSPEND bit can be used by the driver to suspend a device,
> >>>>>> in order to stabilize the device states and virtqueue states.
> >>>>>>
> >>>>>> Its main use case is live migration.
> >>>>>>
> >>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>>> You constantly complained that whatever was proposed using admin
> >>>> commands method in [1] must work for passthrough and non-passthrough.
> >>>>> And halfway in the discussion you propose a method after learning
> >>>>> all the
> >>>> limitations of in-band, you propose a solution only works for
> >>>> non-passthrough mode.
> >>>>> You asked someone to have comprehensive proposal and when it comes
> >>>>> to
> >>>> you following it, you just donât.
> >>>> not sure what you are talking about.
> >>>>> And have most shallow commit message to not even mention it.
> >>>>>
> >>>>> Please be consistent in design approach.
> >>>>> And if you donât want to be, stop asking others.
> >>>> this SUSPEND/RESUME doesn't change since the RFC series, how can it
> >>>> not be inconsistent???
> >>>>> This is not the way TC collaboration works.
> >>>>> I probably shouldnât even expect this from you.
> >>> Your proposal does not cover both the use cases of passthrough and
> >>> non-
> >> passthrough.
> >>> Yet you kept demanding them for others.
> >>> This is just wrong.
> >>>
> >>> I am aware that both models as technical pros and cons.
> >> Why this doesn't work? the device status byte has been working for
> >> many years, and do you know when guest freeze, the hypervisor owns the
> device????
> > When the guest is not frozen and during the pre-copy phase, hypervisor needs
> to access the device (context, dirty pages).
> > How does it work if the guest owns the device?
> Have you seen PASID there?
PASID does not help because as explained virtio common config space and device specific config space is owned by the guest driver.

Secondly PASID space is also owned by the guest driver.

> >
> >>>>> [1]
> >>>>> https://lists.oasis-open.org/archives/virtio-comment/202310/msg004
> >>>>> 72
> >>>>> .h
> >>>>> tml
> >>>> Please don't be so emotional and please be professional.
> >>>>
> >>>> Why this solution can not work for pass-through? Do you know the
> >>>> device ownership will be transferred to the hypervisor when guest
> >>>> suspended in live migration?
> >>> I explained 5 reasons why it does not work in previous reply.
> >>>
> >>> As the word indicates "live migration", the hypervisor needs to
> >>> access the
> >> device when it is "live" (not just after).
> >>> Hence, passthrough mode must be able to capture the state of the
> >>> device and
> >> dirty pages database when its live.
> >>> (and after the source is suspended).
> >> No, the hypervisor should only collect dirty pages when the device alive.
> > It is needed during both the times.
> > When the device and guest is live during pre-copy phase.
> > And after the device is frozen, to get the final round of pages.
> With PASID, dirty page tracking facility can be isolated from the guest, means
> the hypervisor owns this facility. So the hypervisor can collect the dirty pages.
> 
> When the device suspended, it should report the last round of dirty pages
> through dirty page tracking facility as expected.
> 
> This can work, right?
Unfortunately no, as non atomic bitmap cannot reside in the host memory,
And whatever is in the device gets reset on device reset and/or FLR. So the dirty map detail is lost.
Similarly the device context is also lost on these two events triggered by guest.

> >
> >> As you can see, the dirty page tracking facility has a PASID for
> >> isolation. But still, the question is, we should better use platform
> >> dirty page tracking
> >>
> > Nothing to do with PASID, as PASID is owned by the guest.
> It looks you don't know how PASID work.

> 
> Host can setup PASID to isolate some facilities, right?
There are few limitations with PASID.
a. All platforms do not have PASID and 
b. I explained above PASID do not work always as PASID only bifurcates DMA not the device _functionality_.
c. PASID to be available to guest as_is what is present on the device

> >
> >> Then suspend the device after guest freeze, to stabilize the device
> >> status, then read the status.
> >>
> >> How can you say this does not work???
> > I explained above.
> see above



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]