[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH v3 6/8] admin: Add theory of operation for write recording commands
On Thu, Nov 16, 2023 at 2:49âPM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Nov 16, 2023 at 12:24:27PM +0800, Jason Wang wrote: > > On Thu, Nov 16, 2023 at 1:37âAM Parav Pandit <parav@nvidia.com> wrote: > > > > > > > > > > From: Jason Wang <jasowang@redhat.com> > > > > Sent: Monday, November 13, 2023 9:11 AM > > > > > > > > On Fri, Nov 10, 2023 at 2:46âPM Parav Pandit <parav@nvidia.com> wrote: > > > > > > > > > > Hi Michael, > > > > > > > > > > > From: Michael S. Tsirkin <mst@redhat.com> > > > > > > Sent: Thursday, November 9, 2023 1:29 PM > > > > > > > > > > [..] > > > > > > > Besides the issue of performance, it's also racy, assuming we are > > > > > > > logging > > > > > > IOVA. > > > > > > > > > > > > > > 0) device log IOVA > > > > > > > 1) hypervisor fetches IOVA from log buffer > > > > > > > 2) guest map IOVA to a new GPA > > > > > > > 3) hypervisor traverse guest table to get IOVA to new GPA > > > > > > > > > > > > > > Then we lost the old GPA. > > > > > > > > > > > > Interesting and a good point. And by the way e.g. vhost has the same > > > > > > issue. You need to flush dirty tracking info when changing the > > > > > > mappings somehow. Parav what's the plan for this? Should be addressed in > > > > the spec too. > > > > > > > > > > > As you listed the flush is needed for vhost or device-based DPT. > > > > > > > > What does DPT mean? Device Page Table? Let's not invent terminology which is > > > > not known by others please. > > > > > > > Sorry for using the acronym. I meant dirty page tracking. > > > > > > > We have discussed it many times. You can't just depend on ATS or reinventing > > > > wheels in virtio. > > > The dependency is on the iommu which would have the mapping of GIOVA to GPA like any sw implementation. > > > No dependency on ATS. > > > > > > > > > > > What's more, please try not to give me the impression that the proposal is > > > > optimized for a specific vendor (like device IOMMU stuffs). > > > > > > > You should stop calling this specific vendor thing. > > > > Well, as you have explained, the confusion came from "DPT" ... > > > > > One can equally say that suspend bit proposal is for the sw_vendor device who is forcing virtio hw device to only implement ioqueues + PASID + non_unified interface for PF, VF, SIOVs + non_TDISP based devices. > > > > > > > > The necessary plumbing is already covered for this in the query (read and > > > > clear) command of this v3 proposal. > > > > > > > > The issue is logging via IOVA ... I don't see how "read and clear" can help. > > > > > > > Read and clear helps that ensures that all the dirty pages are reported, hence there is no mapping/unmapping race. > > > > Reported as IOVA ... > > > > > As everything is reported. > > > > > > > > It is listed in Device Write Records Read Command. > > > > > > > > Please explain how your proposal can solve the above race. > > > > > > > In below manner. > > > 1. guest has GIOVA to GPA_1 mapping > > > 2. RX packets occurred to GIOVA > > > 3. device reported dirty page log for GIOVA (hypervisor is yet to read) > > > 4. guest requested mapping change from GIOVA to GPA_2 > > > 4.1 During this IOTLB is invalidated and dirty page report is queried ensuring, it can change the mapping > > > > It requires > > > > 1) hypervisor traps IOTLB invalidation, which doesn't work when > > nesting could be offloaded (IOMMUFD has started the work to support > > nesting) > > 2) query the device about the dirty page on each IOTLB invalidation which: > > 2.1) A huge round trip: guest IOTLB invalidation -> trapped by > > hypervisor -> start the query from the device -> device return -> > > hypervisor reports IOTLB invalidation is done -> let guest run. Have > > you benchmarked the RTT in this case? There are just too many places > > that cause the delay in the middle. > > To be fair invalidations are already expensive e.g. with vhost iotlb > it requires a slow system call. > This will make them *even more* expensive. Yes, a slow syscall plus a virtqueue query RTT. Need some benchmark. It looks to me currently the invalidation is done via a queued based interface in vtd. So guests may need to spin where it may trigger a lockup in the guest. > > Problem for some but not all workloads. Again I agree motivation, > tradeoffs and comparison with both dirty tracking by iommu and shadow vq > approaches really should be included. +1 > > > > 2.2) Guest triggerable behaviour, malicious guest can simply do > > endless IOTLB invalidation to DOS the e.g admin virtqueue > > I'm not sure how much to worry about it - just don't allow more > than one in flight per VM. That's fine but it may need a note. Thanks > > > > > > > > > > > > > > > > When the page write record is fully read, it is flushed. > > > > > How/when to use, I think its hypervisor specific, so we probably better off not > > > > documenting those details. > > > > > > > > Well, as the author of this proposal, at least you need to know how a hypervisor > > > > can work with your proposal, no? > > > > > > > Likely yes, but it is not the scope of the spec to list those paths etc. > > > > Fine, but as a reviewer I need to know if it can work with a hypervisor well. > > > > > > > > > > May be such read is needed in some other path too depending on how > > > > hypervisor implemented. > > > > > > > > What do you mean by "May be ... some other path" here? You're inventing a > > > > mechanism that you don't know how a hypervisor can use? > > > > > > No. I meant hypervisor may have more operations that map/unmap/flush where it may need to implement it. > > > Some one may call it set_map(), some may say dma_map()... > > > > Ok. > > > > Thanks >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]