OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands


On Wed, Nov 22, 2023 at 12:25âAM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, November 21, 2023 9:52 AM
> >
> > On Thu, Nov 16, 2023 at 2:49âPM Michael S. Tsirkin <mst@redhat.com>
> > wrote:
> > >
> > > On Thu, Nov 16, 2023 at 12:24:27PM +0800, Jason Wang wrote:
> > > > On Thu, Nov 16, 2023 at 1:37âAM Parav Pandit <parav@nvidia.com>
> > wrote:
> > > > >
> > > > >
> > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > Sent: Monday, November 13, 2023 9:11 AM
> > > > > >
> > > > > > On Fri, Nov 10, 2023 at 2:46âPM Parav Pandit <parav@nvidia.com>
> > wrote:
> > > > > > >
> > > > > > > Hi Michael,
> > > > > > >
> > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Sent: Thursday, November 9, 2023 1:29 PM
> > > > > > >
> > > > > > > [..]
> > > > > > > > > Besides the issue of performance, it's also racy, assuming
> > > > > > > > > we are logging
> > > > > > > > IOVA.
> > > > > > > > >
> > > > > > > > > 0) device log IOVA
> > > > > > > > > 1) hypervisor fetches IOVA from log buffer
> > > > > > > > > 2) guest map IOVA to a new GPA
> > > > > > > > > 3) hypervisor traverse guest table to get IOVA to new GPA
> > > > > > > > >
> > > > > > > > > Then we lost the old GPA.
> > > > > > > >
> > > > > > > > Interesting and a good point. And by the way e.g. vhost has
> > > > > > > > the same issue.  You need to flush dirty tracking info when
> > > > > > > > changing the mappings somehow.  Parav what's the plan for
> > > > > > > > this? Should be addressed in
> > > > > > the spec too.
> > > > > > > >
> > > > > > > As you listed the flush is needed for vhost or device-based DPT.
> > > > > >
> > > > > > What does DPT mean? Device Page Table? Let's not invent
> > > > > > terminology which is not known by others please.
> > > > > >
> > > > > Sorry for using the acronym. I meant dirty page tracking.
> > > > >
> > > > > > We have discussed it many times. You can't just depend on ATS or
> > > > > > reinventing wheels in virtio.
> > > > > The dependency is on the iommu which would have the mapping of
> > GIOVA to GPA like any sw implementation.
> > > > > No dependency on ATS.
> > > > >
> > > > > >
> > > > > > What's more, please try not to give me the impression that the
> > > > > > proposal is optimized for a specific vendor (like device IOMMU stuffs).
> > > > > >
> > > > > You should stop calling this specific vendor thing.
> > > >
> > > > Well, as you have explained, the confusion came from "DPT" ...
> > > >
> > > > > One can equally say that suspend bit proposal is for the sw_vendor
> > device who is forcing virtio hw device to only implement ioqueues + PASID +
> > non_unified interface for PF, VF, SIOVs + non_TDISP based devices.
> > > > >
> > > > > > > The necessary plumbing is already covered for this in the
> > > > > > > query (read and
> > > > > > clear) command of this v3 proposal.
> > > > > >
> > > > > > The issue is logging via IOVA ... I don't see how "read and clear" can
> > help.
> > > > > >
> > > > > Read and clear helps that ensures that all the dirty pages are reported,
> > hence there is no mapping/unmapping race.
> > > >
> > > > Reported as IOVA ...
> > > >
> > > > > As everything is reported.
> > > > >
> > > > > > > It is listed in Device Write Records Read Command.
> > > > > >
> > > > > > Please explain how your proposal can solve the above race.
> > > > > >
> > > > > In below manner.
> > > > > 1. guest has GIOVA to GPA_1 mapping 2. RX packets occurred to
> > > > > GIOVA 3. device reported dirty page log for GIOVA (hypervisor is
> > > > > yet to read) 4. guest requested mapping change from GIOVA to GPA_2
> > > > > 4.1 During this IOTLB is invalidated and dirty page report is
> > > > > queried ensuring, it can change the mapping
> > > >
> > > > It requires
> > > >
> > > > 1) hypervisor traps IOTLB invalidation, which doesn't work when
> > > > nesting could be offloaded (IOMMUFD has started the work to support
> > > > nesting)
> > > > 2) query the device about the dirty page on each IOTLB invalidation which:
> > > > 2.1) A huge round trip: guest IOTLB invalidation -> trapped by
> > > > hypervisor -> start the query from the device -> device return ->
> > > > hypervisor reports IOTLB invalidation is done -> let guest run. Have
> > > > you benchmarked the RTT in this case? There are just too many places
> > > > that cause the delay in the middle.
> > >
> > > To be fair invalidations are already expensive e.g. with vhost iotlb
> > > it requires a slow system call.
> > > This will make them *even more* expensive.
> >
> > Yes, a slow syscall plus a virtqueue query RTT.
> >
> Only during viommu case.

What's worse, modern IOMMU drivers tend to batch the invalidation via
a per domain invalidation. I can easily imagine how slow it is.

For vhost, it's still one syscall. But for your proposal, it needs to
do a query of all the possible IOVA ranges which is horribly slow as
it needs several queries requests or PCI transactions where guest
lockup would not be rare.

> Without this is not applicable.

List is a place to discuss the possible issue, no?

>
> > Need some benchmark. It looks to me currently the invalidation is done via a
> > queued based interface in vtd. So guests may need to spin where it may trigger
> > a lockup in the guest.
> >
>
> > >
> > > Problem for some but not all workloads.  Again I agree motivation,
> > > tradeoffs and comparison with both dirty tracking by iommu and shadow
> > > vq approaches really should be included.
> >
> Dirty tracking is iommu to be considered.
> Shadow vq is not in my scope and it does not fit the basic requirements as explained before.

I don't see a good explanation other than "I meet a bug in Linus tree,
it's unstable so I wouldn't test anymore".

Thanks



> So it is different discussion.
>
> > +1
> >
> > >
> > >
> > > > 2.2) Guest triggerable behaviour, malicious guest can simply do
> > > > endless IOTLB invalidation to DOS the e.g admin virtqueue
> > >
> > > I'm not sure how much to worry about it - just don't allow more than
> > > one in flight per VM.
> >
> > That's fine but it may need a note.
> >
> > Thanks
> >
> >
> > >
> > >
> > >
> > > > >
> > > > > > >
> > > > > > > When the page write record is fully read, it is flushed.
> > > > > > > How/when to use, I think its hypervisor specific, so we
> > > > > > > probably better off not
> > > > > > documenting those details.
> > > > > >
> > > > > > Well, as the author of this proposal, at least you need to know
> > > > > > how a hypervisor can work with your proposal, no?
> > > > > >
> > > > > Likely yes, but it is not the scope of the spec to list those paths etc.
> > > >
> > > > Fine, but as a reviewer I need to know if it can work with a hypervisor well.
> > > >
> > > > >
> > > > > > > May be such read is needed in some other path too depending on
> > > > > > > how
> > > > > > hypervisor implemented.
> > > > > >
> > > > > > What do you mean by "May be ... some other path" here? You're
> > > > > > inventing a mechanism that you don't know how a hypervisor can use?
> > > > >
> > > > > No. I meant hypervisor may have more operations that
> > map/unmap/flush where it may need to implement it.
> > > > > Some one may call it set_map(), some may say dma_map()...
> > > >
> > > > Ok.
> > > >
> > > > Thanks
> > >
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]