virtio-comment message

Subject: RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands

From: Parav Pandit <parav@nvidia.com>
To: Jason Wang <jasowang@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 9 Nov 2023 06:26:44 +0000

> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, November 8, 2023 9:59 AM
> 
> On Tue, Nov 7, 2023 at 3:05âPM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Nov 07, 2023 at 12:04:29PM +0800, Jason Wang wrote:
> > > > > > Each virtio and non virtio devices who wants to report their
> > > > > > dirty page report,
> > > > > will do their way.
> > > > > >
> > > > > > > 3) inventing it in the virtio layer will be deprecated in
> > > > > > > the future for sure, as platform will provide much rich
> > > > > > > features for logging e.g it can do it per PASID etc, I don't
> > > > > > > see any reason virtio need to compete with the features that
> > > > > > > will be provided by the platform
> > > > > > Can you bring the cpu vendors and committement to virtio tc
> > > > > > with timelines
> > > > > so that virtio TC can omit?
> > > > >
> > > > > Why do we need to bring CPU vendors in the virtio TC? Virtio
> > > > > needs to be built on top of transport or platform. There's no need to
> duplicate their job.
> > > > > Especially considering that virtio can't do better than them.
> > > > >
> > > > I wanted to see a strong commitment for the cpu vendors to support dirty
> page tracking.
> > >
> > > The RFC of IOMMUFD support can go back to early 2022. Intel, AMD and
> > > ARM are all supporting that now.
> > >
> > > > And the work seems to have started for some platforms.
> > >
> > > Let me quote from the above link:
> > >
> > > """
> > > Today, AMD Milan (or more recent) supports it while ARM SMMUv3.2
> > > alongside VT-D rev3.x also do support.
> > > """
> > >
> > > > Without such platform commitment, virtio also skipping it would not work.
> > >
> > > Is the above sufficient? I'm a little bit more familiar with vtd,
> > > the hw feature has been there for years.
> >
> >
> > Repeating myself - I'm not sure that will work well for all workloads.
> 
> I think this comment applies to this proposal as well.
> 
> > Definitely KVM did
> > not scan PTEs. It used pagefaults with bit per page and later as VM
> > size grew switched to PLM.  This interface is analogous to PLM,
> 
> I think you meant PML actually. And it doesn't work like PML. To behave like
> PML it needs to
> 
> 1) log buffers were organized as a queue with indices
> 2) device needs to suspend (as a #vmexit in PML) if it runs out of the buffers
> 3) device need to send a notification to the driver if it runs out of the buffer
> 
> I don't see any of the above in this proposal. If we do that it would be less
> problematic than what is being proposed here.
> 
In this proposal, its slightly different than PML.
The log buffer is a write record with the device. It keeps recording it.
And owner driver queries the recorded pages.
The device internally can do PML or other different implementations as it finds suitable.

> Even if we manage to do that, it doesn't mean we won't have issues.
> 
> 1) For many reasons it can neither see nor log via GPA, so this requires a
> traversal of the vIOMMU mapping tables by the hypervisor afterwards, it would
> be expensive and need synchronization with the guest modification of the IO
> page table which looks very hard.
> 2) There are a lot of special or reserved IOVA ranges (for example the interrupt
> areas in x86) that need special care which is architectural and where it is
> beyond the scope or knowledge of the virtio device but the platform IOMMU.
> Things would be more complicated when SVA is enabled. And there could be
> other architecte specific knowledge (e.g
> PAGE_SIZE) that might be needed. There's no easy way to deal with those cases.
> 

Current and future iommufd and OS interface likely can support this already.
In current proposal, multiple ranges are supplied to the device, the reserved ranges are not part of it.

> We wouldn't need to care about all of them if it is done at platform IOMMU
> level.
> 
I agree that when platform IOMMU has support and if its better it should be first priority to use by the hypervisor.
Mainly because the D bit of the page already there, and not a special PML queue or a racy bitmap like what was proposed in other series.

Follow-Ups:
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: "Michael S. Tsirkin" <mst@redhat.com>

References:
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>