virtio-dev message

Subject: Re: [virtio-dev] Dirty Page Tracking (DPT)

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Date: Mon, 9 Mar 2020 06:13:23 -0400

On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote:
> 
> On 2020/3/9 äå3:38, Michael S. Tsirkin wrote:
> > On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
> > > I understand that DPT isn't really on the forefront of the vDPA framework, but
> > > wanted to understand if there any initial thoughts on how this would work...
> > And judging by the next few chapters, you are actually
> > talking about vhost pci, right?
> > 
> > > In the migration framework, in its simplest form, (I gather) its QEMU via KVM
> > > that is reading the dirty page table, converting bits to page numbers, then
> > > flushing remote VM/copying local page(s)->remote VM, ect.
> > > 
> > > While this is fine for a VM (say VM1) dirtying its own memory and the accesses
> > > are trapped in the kernel as well as the log is being updated, I'm not sure
> > > what happens in the situationÂof vhost, where a remote VM (say VM2) is dirtying
> > > up VM1's memory since it can directly access it, during packet reception for
> > > example.
> > > Whatever technique is employedÂto catch this, how would this differ from a HW
> > > based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
> > > going to have a 2nd place to query the dirty logs - ie: the vDPA layer?
> > I don't think anyone has a good handle at the vhost pci migration yet.
> > But I think a reasonable way to handle that would be to
> > activate dirty tracking in VM2's QEMU.
> > 
> > And then VM2's QEMU would periodically copy the bits to the log - does
> > this sound right?
> > 
> > > Further I heard about a SW based DPT within the vDPA framework for those
> > > devices that do not (yet) support DPT inherently in HW. How is this envisioned
> > > to work?
> > What I am aware of is simply switching to a software virtio
> > for the duration of migration. The software can be pretty simple
> > since the formats match: just copy available entries to device ring,
> > and for used entries, see a used ring entry, mark page
> > dirty and then copy used entry to guest ring.
> 
> 
> That looks more heavyweight than e.g just relay used ring (as what dpdk did)
> I believe?

That works for used but not for the packed ring.

> 
> > 
> > 
> > Another approach that I proposed and was prototyped at some point by
> > Alex Duyck is guest driver touching the page in question before
> > processing it within guest e.g. by an atomic xor with 0.
> > Sounds attractive but didn't perform all that well.
> 
> 
> Intel posted i40e software solution that traps queue tail/head write. But
> I'm not sure it's good enough.
> 
> https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/


DMA unmap time seems more generic to me. But again I suspect
the main issue is the same - it's handled on the data path
blocking packet RX until dirty tracking is handled.

Hardware solutions by comparison queue writes and make
progress, dirty page is handled by the migration CPU.


> 
> > 
> > 
> > > Finally, for those HW vendors that do support DPT in HW, a mapping of a bit ->
> > > page isn't really an option, since no one wants to do a byte wide
> > > read-modify-write across the PCI bus, but ratherÂ map a whole byte to page is
> > > likely more desirable - the HW can just do non-posted writes to the dirty page
> > > table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping
> > > (from byte->bit) or have the capability to handle the granularity diffs.
> > > 
> > > Thoughts?
> > > 
> > > Rob Miller
> > > rob.miller@broadcom.com
> > > (919)721-3339
> > If using an IOMMU, DPT can also be done using either PRI or dirty bit in
> > a PTE. PRI is an interrupt so it can kick off a thread to set bits in
> > the log I guess, but if it's the dirty bit then I don't think there's an
> > interrupt. And a polling thread does not sound attractive.  I guess
> > we'll need a new interface to notify VDPA that QEMU is looking for dirty
> > logs, and then VDPA can send them to QEMU in some way.  Will probably be
> > good enough to support vendor specific logging interfaces, too.  I don't
> > actually have hardware which supports either so actually coding it up is
> > not yet practical.
> 
> 
> Yes, both PRI and PTE dirty bit requires special hardware support. We can
> extend vDPA API to support both. For page fault, probably just a IOMMU page
> fault handler.
> 
> 
> > 
> > Further, at my KVM forum presentaiton I proposed a virtio-specific
> > pagefault handling interface.  If there's a wish to standardize and
> > implement that, let me know and I will try to write this up in a more
> > formal way.
> 
> 
> Besides pagefault, if we want virito to be more like vhost, we need also
> formalize the device state feching. E.g per vq index etc.
> 
> Thanks

Yes that would clearly be in-scope for the spec.   I would not start
with a guest/host interface even.  I would start by just listing what
the state that needs to be migrated is, for each device. And it would
also be useful to list, for each device, how to make two devices
compatible migration wise.  We can do that in a non-normative section.
Again the big blocker here is lack of manpower.

-- 
MST

Follow-Ups:
- Re: [virtio-dev] Dirty Page Tracking (DPT)
  - From: Jason Wang <jasowang@redhat.com>

References:
- Dirty Page Tracking (DPT)
  - From: Rob Miller <rob.miller@broadcom.com>
- Re: [virtio-dev] Dirty Page Tracking (DPT)
  - From: "Michael S. Tsirkin" <mst@redhat.com>
- Re: [virtio-dev] Dirty Page Tracking (DPT)
  - From: Jason Wang <jasowang@redhat.com>