OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands


> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, November 6, 2023 12:04 PM
> 
> On Thu, Nov 2, 2023 at 2:10âPM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, November 2, 2023 9:54 AM
> > >
> > > On Wed, Nov 1, 2023 at 11:02âAM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > >
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Wednesday, November 1, 2023 6:00 AM
> > > > >
> > > > > On Tue, Oct 31, 2023 at 11:27âAM Parav Pandit <parav@nvidia.com>
> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > Sent: Tuesday, October 31, 2023 7:13 AM
> > > > > > >
> > > > > > > On Mon, Oct 30, 2023 at 9:21âPM Parav Pandit
> > > > > > > <parav@nvidia.com>
> > > wrote:
> > > > > > > >
> > > > > > > > During a device migration flow (typically in a precopy
> > > > > > > > phase of the live migration), a device may write to the
> > > > > > > > guest memory. Some iommu/hypervisor may not be able to
> > > > > > > > track these
> > > written pages.
> > > > > > > > These pages to be migrated from source to destination hypervisor.
> > > > > > > >
> > > > > > > > A device which writes to these pages, provides the page
> > > > > > > > address record of the to the owner device. The owner
> > > > > > > > device starts write recording for the device and queries
> > > > > > > > all the page addresses written by the device.
> > > > > > > >
> > > > > > > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/176
> > > > > > > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > > > > > > Signed-off-by: Satananda Burla <sburla@marvell.com>
> > > > > > > > ---
> > > > > > > > changelog:
> > > > > > > > v1->v2:
> > > > > > > > - addressed comments from Michael
> > > > > > > > - replaced iova with physical address
> > > > > > > > ---
> > > > > > > >  admin-cmds-device-migration.tex | 15 +++++++++++++++
> > > > > > > >  1 file changed, 15 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/admin-cmds-device-migration.tex
> > > > > > > > b/admin-cmds-device-migration.tex index ed911e4..2e32f2c
> > > > > > > > 100644
> > > > > > > > --- a/admin-cmds-device-migration.tex
> > > > > > > > +++ b/admin-cmds-device-migration.tex
> > > > > > > > @@ -95,6 +95,21 @@ \subsubsection{Device
> > > > > > > > Migration}\label{sec:Basic Facilities of a Virtio Device /
> > > > > > > > The owner driver can discard any partially read or written
> > > > > > > > device context when  any of the device migration flow
> > > > > > > should be aborted.
> > > > > > > >
> > > > > > > > +During the device migration flow, a passthrough device
> > > > > > > > +may write data to the guest virtual machine's memory, a
> > > > > > > > +source hypervisor needs to keep track of these written
> > > > > > > > +memory to migrate such memory to destination
> > > > > > > hypervisor.
> > > > > > > > +Some systems may not be able to keep track of such memory
> > > > > > > > +write addresses at hypervisor level. In such a scenario,
> > > > > > > > +a device records and reports these written memory
> > > > > > > > +addresses to the owner device. The owner driver enables
> > > > > > > > +write recording for one or more physical address ranges
> > > > > > > > +per device during device
> > > migration flow.
> > > > > > > > +The owner driver periodically queries these written
> > > > > > > > +physical address
> > > > > records from the device.
> > > > > > >
> > > > > > > I wonder how PA works in this case. Device uses untranslated
> > > > > > > requests so it can only see IOVA. We can't mandate ATS anyhow.
> > > > > > Michael suggested to keep the language uniform as PA as this
> > > > > > is ultimately
> > > > > what the guest driver is supplying during vq creation and in
> > > > > posting buffers as physical address.
> > > > >
> > > > > This seems to need some work. And, can you show me how it can work?
> > > > >
> > > > > 1) e.g if GAW is 48 bit, is the hypervisor expected to do a
> > > > > bisection of the whole range?
> > > > > 2) does the device need to reserve sufficient internal resources
> > > > > for logging the dirty page and why (not)?
> > > > No when dirty page logging starts, only at that time, device will
> > > > reserve
> > > enough resources.
> > >
> > > GAW is 48bit, how large would it have then?
> > Dirty page tracking is not dependent on the size of the GAW.
> > It is function of address ranges for the amount of guest memory regardless of
> GAW.
> 
> The problem is, e.g when vIOMMU is enabled, you can't know which IOVA is
> actually used by guests. And even for the case when vIOMMU is not enabled,
> the guest may have several TBs. Is it easy to reserve sufficient resources by the
> device itself?
> 
When page tracking is enabled per device, it knows about the range and it can reserve certain resource.

> Host should always have more resources than device, in that sense there could
> be several methods that tries to utilize host memory instead of the one in the
> device. I think we've discussed this when going through the doc prepared by
> Eugenio.
> 
> >
> > > What happens if we're trying to migrate more than 1 device?
> > >
> > That is perfectly fine.
> > Each device is updating its log of pages it wrote.
> > The hypervisor is collecting their sum.
> 
> See above.
> 
> >
> > > >
> > > > > 3) DMA is part of the transport, it's natural to do logging
> > > > > there, why duplicate efforts in the virtio layer?
> > > > He he, you have funny comment.
> > > > When an abstract facility is added to virtio you say to do in transport.
> > >
> > > So it's not done in the general facility but tied to the admin part.
> > > And we all know dirty page tracking is a challenge and Eugenio has a
> > > good summary of pros/cons. A revisit of those docs make me think
> > > virtio is not the good place for doing that for may reasons:
> > >
> > > 1) as stated, platform will evolve to be able to tracking dirty
> > > pages, actually, it has been supported by a lot of major IOMMU
> > > vendors
> >
> > This is optional facility in virtio.
> > Can you please point to the references? I donât see it in the common Linux
> kernel support for it.
> 
> Note that when IOMMUFD is being proposed, dirty page tracking is one of the
> major considerations.
> 
> This is one recent proposal:
> 
> https://www.spinics.net/lists/kvm/msg330894.html
> 
Sure, so if platform supports it. it can be used from the platform.
If it does not, the device supplies it.

> > Instead Linux kernel choose to extend to the devices.
> 
> Well, as I stated, tracking dirty pages is challenging if you want to do it on a
> device, and you can't simply invent dirty page tracking for each type of the
> devices.
> 
It is not invented.
It is generic framework for all virtio device types as proposed here.
Keep in mind, that it is optional already in v3 series.

> > At least not seen to arrive this in any near term in start of 2024 which is
> where users must use this.
> >
> > > 2) you can't assume virtio is the only device that can be used by
> > > the guest, having dirty pages tracking to be implemented in each
> > > type of device is unrealistic
> > Of course, there is no such assumption made. Where did you see a text that
> made such assumption?
> 
> So what happens if you have a guest with virtio and other devices assigned?
> 
What happens? Each device type would do its own dirty page tracking.
And if all devices does not have support, hypervisor knows to fall back to platform iommu or its own.

> > Each virtio and non virtio devices who wants to report their dirty page report,
> will do their way.
> >
> > > 3) inventing it in the virtio layer will be deprecated in the future
> > > for sure, as platform will provide much rich features for logging
> > > e.g it can do it per PASID etc, I don't see any reason virtio need
> > > to compete with the features that will be provided by the platform
> > Can you bring the cpu vendors and committement to virtio tc with timelines
> so that virtio TC can omit?
> 
> Why do we need to bring CPU vendors in the virtio TC? Virtio needs to be built
> on top of transport or platform. There's no need to duplicate their job.
> Especially considering that virtio can't do better than them.
> 
I wanted to see a strong commitment for the cpu vendors to support dirty page tracking.
And the work seems to have started for some platforms.
Without such platform commitment, virtio also skipping it would not work.

> > i.e. in first year of 2024?
> 
> Why does it matter in 2024?
Because users needs to use it now.

> 
> > If not, we are better off to offer this, and when/if platform support is, sure,
> this feature can be disabled/not used/not enabled.
> >
> > > 4) if the platform support is missing, we can use software or
> > > leverage transport for assistance like PRI
> > All of these are in theory.
> > Our experiment shows PRI performance is 21x slower than page fault rate
> done by the cpu.
> > It simply does not even pass a simple 10Gbps test.
> 
> If you stick to the wire speed during migration, it can converge.
Do you have perf data for this?
In the internal tests we donât see this happening.

> 
> > There is no requirement for mandating PRI either.
> > So it is unusable.
> 
> It's not about mandating, it's about doing things in the correct layer. If PRI is
> slow, PCI can evolve for sure.
You should try.
In the current state, it is mandating.
And if you think PRI is the only way, than you should propose that in the dirty page tracking series that you listed above to not do dirty page tracking. Rather depend on PRI, right?

> 
> >
> > >
> > > > When one does something in transport, you say, this is transport
> > > > specific, do
> > > some generic.
> > > >
> > > > Here the device is being tracked is virtio device.
> > > > PCI-SIG has told already that PCIM interface is outside the scope of it.
> > > > Hence, this is done in virtio layer here in abstract way.
> > >
> > > You will end up with a competition with the platform/transport one
> > > that will fail.
> > >
> > I donât see a reason. There is no competition.
> > Platform always have a choice to not use device side page tracking when it is
> supported.
> 
> Platform provides a lot of other functionalities for dirty logging:
> e.g per PASID, granular, etc. So you want to duplicate them again in the virtio? If
> not, why choose this way?
> 
It is optional for the platforms where platform do not have it.

> >
> > > >
> > > > > I can't see how it can compete with the functionality that is
> > > > > provided by the platform. And what's more, we can't assume
> > > > > virtio is the only device that is used by the guest.
> > > > >
> > > > You raised this before and it was answered.
> > > > Not all platform support dirty page tracking effectively.
> > > > This is optional facility that speed up the migration down time
> significantly.
> > >
> > > I can hardly believe the downtime is determined by the speed of
> > > logging dirty pages...
> > >
> > Without dirty page tracking, in pre-copy, all the memory must be migrated
> again, which takes significantly longer time.
> 
> It's about w/ and w/o dirty page tracking, it's not about the speed of dirty page
> tracking.
> 
> >
> > > > So until platform supports it, it is supported by virtio.
> > >
> > > Some platforms already support that.
> > >
> > And some done.
> > Hence whichever wants to use by platform, will use by platform.
> > Whichever prefer to use from device will use by the device.
> 
> I don't think so, for example virtio has a hard time when it doesn't rely on the
> platform (e.g IOMMU). We don't want to repeat that tragedy.
To general statement that I donât think is applicable here.

But lets see the support from platform.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]