virtio-comment message

Subject: Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Date: Wed, 15 Nov 2023 02:59:36 -0500

On Thu, Nov 09, 2023 at 06:26:44AM +0000, Parav Pandit wrote:
> 
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, November 8, 2023 9:59 AM
> > 
> > On Tue, Nov 7, 2023 at 3:05âPM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Nov 07, 2023 at 12:04:29PM +0800, Jason Wang wrote:
> > > > > > > Each virtio and non virtio devices who wants to report their
> > > > > > > dirty page report,
> > > > > > will do their way.
> > > > > > >
> > > > > > > > 3) inventing it in the virtio layer will be deprecated in
> > > > > > > > the future for sure, as platform will provide much rich
> > > > > > > > features for logging e.g it can do it per PASID etc, I don't
> > > > > > > > see any reason virtio need to compete with the features that
> > > > > > > > will be provided by the platform
> > > > > > > Can you bring the cpu vendors and committement to virtio tc
> > > > > > > with timelines
> > > > > > so that virtio TC can omit?
> > > > > >
> > > > > > Why do we need to bring CPU vendors in the virtio TC? Virtio
> > > > > > needs to be built on top of transport or platform. There's no need to
> > duplicate their job.
> > > > > > Especially considering that virtio can't do better than them.
> > > > > >
> > > > > I wanted to see a strong commitment for the cpu vendors to support dirty
> > page tracking.
> > > >
> > > > The RFC of IOMMUFD support can go back to early 2022. Intel, AMD and
> > > > ARM are all supporting that now.
> > > >
> > > > > And the work seems to have started for some platforms.
> > > >
> > > > Let me quote from the above link:
> > > >
> > > > """
> > > > Today, AMD Milan (or more recent) supports it while ARM SMMUv3.2
> > > > alongside VT-D rev3.x also do support.
> > > > """
> > > >
> > > > > Without such platform commitment, virtio also skipping it would not work.
> > > >
> > > > Is the above sufficient? I'm a little bit more familiar with vtd,
> > > > the hw feature has been there for years.
> > >
> > >
> > > Repeating myself - I'm not sure that will work well for all workloads.
> > 
> > I think this comment applies to this proposal as well.
> > 
> > > Definitely KVM did
> > > not scan PTEs. It used pagefaults with bit per page and later as VM
> > > size grew switched to PLM.  This interface is analogous to PLM,
> > 
> > I think you meant PML actually. And it doesn't work like PML. To behave like
> > PML it needs to
> > 
> > 1) log buffers were organized as a queue with indices
> > 2) device needs to suspend (as a #vmexit in PML) if it runs out of the buffers
> > 3) device need to send a notification to the driver if it runs out of the buffer
> > 
> > I don't see any of the above in this proposal. If we do that it would be less
> > problematic than what is being proposed here.
> > 
> In this proposal, its slightly different than PML.
> The log buffer is a write record with the device. It keeps recording it.
> And owner driver queries the recorded pages.
> The device internally can do PML or other different implementations as it finds suitable.

I personally like it that this detail is hidden inside the device.
One important functionality that PML has and that this does not
have is ability to interrupt host e.g. if is running low on
space to record these info. Want to add it in some way?
E.g. a special command that is only used if device is low
on buffers.


> > Even if we manage to do that, it doesn't mean we won't have issues.
> > 
> > 1) For many reasons it can neither see nor log via GPA, so this requires a
> > traversal of the vIOMMU mapping tables by the hypervisor afterwards, it would
> > be expensive and need synchronization with the guest modification of the IO
> > page table which looks very hard.
> > 2) There are a lot of special or reserved IOVA ranges (for example the interrupt
> > areas in x86) that need special care which is architectural and where it is
> > beyond the scope or knowledge of the virtio device but the platform IOMMU.
> > Things would be more complicated when SVA is enabled. And there could be
> > other architecte specific knowledge (e.g
> > PAGE_SIZE) that might be needed. There's no easy way to deal with those cases.
> > 
> 
> Current and future iommufd and OS interface likely can support this already.
> In current proposal, multiple ranges are supplied to the device, the reserved ranges are not part of it.
> 
> > We wouldn't need to care about all of them if it is done at platform IOMMU
> > level.
> > 
> I agree that when platform IOMMU has support and if its better it should be first priority to use by the hypervisor.
> Mainly because the D bit of the page already there, and not a special PML queue or a racy bitmap like what was proposed in other series.

BTW your bitmap is also racy if there's a vIOMMU, unless hypervisor is
very careful to empty the bitmap when mappings change.
You should document this requirement.


-- 
MST

Follow-Ups:
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>

References:
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Jason Wang <jasowang@redhat.com>
- RE: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
  - From: Parav Pandit <parav@nvidia.com>