OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands


On Wed, Nov 1, 2023 at 11:02âAM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, November 1, 2023 6:00 AM
> >
> > On Tue, Oct 31, 2023 at 11:27âAM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, October 31, 2023 7:13 AM
> > > >
> > > > On Mon, Oct 30, 2023 at 9:21âPM Parav Pandit <parav@nvidia.com> wrote:
> > > > >
> > > > > During a device migration flow (typically in a precopy phase of
> > > > > the live migration), a device may write to the guest memory. Some
> > > > > iommu/hypervisor may not be able to track these written pages.
> > > > > These pages to be migrated from source to destination hypervisor.
> > > > >
> > > > > A device which writes to these pages, provides the page address
> > > > > record of the to the owner device. The owner device starts write
> > > > > recording for the device and queries all the page addresses
> > > > > written by the device.
> > > > >
> > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
> > > > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > > > Signed-off-by: Satananda Burla <sburla@marvell.com>
> > > > > ---
> > > > > changelog:
> > > > > v1->v2:
> > > > > - addressed comments from Michael
> > > > > - replaced iova with physical address
> > > > > ---
> > > > >  admin-cmds-device-migration.tex | 15 +++++++++++++++
> > > > >  1 file changed, 15 insertions(+)
> > > > >
> > > > > diff --git a/admin-cmds-device-migration.tex
> > > > > b/admin-cmds-device-migration.tex index ed911e4..2e32f2c 100644
> > > > > --- a/admin-cmds-device-migration.tex
> > > > > +++ b/admin-cmds-device-migration.tex
> > > > > @@ -95,6 +95,21 @@ \subsubsection{Device
> > > > > Migration}\label{sec:Basic Facilities of a Virtio Device /  The
> > > > > owner driver can discard any partially read or written device
> > > > > context when  any of the device migration flow
> > > > should be aborted.
> > > > >
> > > > > +During the device migration flow, a passthrough device may write
> > > > > +data to the guest virtual machine's memory, a source hypervisor
> > > > > +needs to keep track of these written memory to migrate such
> > > > > +memory to destination
> > > > hypervisor.
> > > > > +Some systems may not be able to keep track of such memory write
> > > > > +addresses at hypervisor level. In such a scenario, a device
> > > > > +records and reports these written memory addresses to the owner
> > > > > +device. The owner driver enables write recording for one or more
> > > > > +physical address ranges per device during device migration flow.
> > > > > +The owner driver periodically queries these written physical address
> > records from the device.
> > > >
> > > > I wonder how PA works in this case. Device uses untranslated
> > > > requests so it can only see IOVA. We can't mandate ATS anyhow.
> > > Michael suggested to keep the language uniform as PA as this is ultimately
> > what the guest driver is supplying during vq creation and in posting buffers as
> > physical address.
> >
> > This seems to need some work. And, can you show me how it can work?
> >
> > 1) e.g if GAW is 48 bit, is the hypervisor expected to do a bisection of the whole
> > range?
> > 2) does the device need to reserve sufficient internal resources for logging the
> > dirty page and why (not)?
> No when dirty page logging starts, only at that time, device will reserve enough resources.

GAW is 48bit, how large would it have then? What happens if we're
trying to migrate more than 1 device?

>
> > 3) DMA is part of the transport, it's natural to do logging there, why duplicate
> > efforts in the virtio layer?
> He he, you have funny comment.
> When an abstract facility is added to virtio you say to do in transport.

So it's not done in the general facility but tied to the admin part.
And we all know dirty page tracking is a challenge and Eugenio has a
good summary of pros/cons. A revisit of those docs make me think
virtio is not the good place for doing that for may reasons:

1) as stated, platform will evolve to be able to tracking dirty pages,
actually, it has been supported by a lot of major IOMMU vendors
2) you can't assume virtio is the only device that can be used by the
guest, having dirty pages tracking to be implemented in each type of
device is unrealistic
3) inventing it in the virtio layer will be deprecated in the future
for sure, as platform will provide much rich features for logging e.g
it can do it per PASID etc, I don't see any reason virtio need to
compete with the features that will be provided by the platform
4) if the platform support is missing, we can use software or leverage
transport for assistance like PRI

> When one does something in transport, you say, this is transport specific, do some generic.
>
> Here the device is being tracked is virtio device.
> PCI-SIG has told already that PCIM interface is outside the scope of it.
> Hence, this is done in virtio layer here in abstract way.

You will end up with a competition with the platform/transport one
that will fail.

>
> > I can't see how it can compete with the functionality
> > that is provided by the platform. And what's more, we can't assume virtio is the
> > only device that is used by the guest.
> >
> You raised this before and it was answered.
> Not all platform support dirty page tracking effectively.
> This is optional facility that speed up the migration down time significantly.

I can hardly believe the downtime is determined by the speed of
logging dirty pages...

> So until platform supports it, it is supported by virtio.

Some platforms already support that.

Thanks

>
> > Thanks
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]