OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] Dirty Page Tracking (DPT)



On 2020/3/9 äå6:13, Michael S. Tsirkin wrote:
On Mon, Mar 09, 2020 at 04:50:43PM +0800, Jason Wang wrote:
On 2020/3/9 äå3:38, Michael S. Tsirkin wrote:
On Fri, Mar 06, 2020 at 10:40:13AM -0500, Rob Miller wrote:
I understand that DPT isn't really on the forefront of the vDPA framework, but
wanted to understand if there any initial thoughts on how this would work...
And judging by the next few chapters, you are actually
talking about vhost pci, right?

In the migration framework, in its simplest form, (I gather) its QEMU via KVM
that is reading the dirty page table, converting bits to page numbers, then
flushing remote VM/copying local page(s)->remote VM, ect.

While this is fine for a VM (say VM1) dirtying its own memory and the accesses
are trapped in the kernel as well as the log is being updated, I'm not sure
what happens in the situationÂof vhost, where a remote VM (say VM2) is dirtying
up VM1's memory since it can directly access it, during packet reception for
example.
Whatever technique is employedÂto catch this, how would this differ from a HW
based Virtio device doing DMA directly into a VM's DDR, wrt to DPT? Is QEMU
going to have a 2nd place to query the dirty logs - ie: the vDPA layer?
I don't think anyone has a good handle at the vhost pci migration yet.
But I think a reasonable way to handle that would be to
activate dirty tracking in VM2's QEMU.

And then VM2's QEMU would periodically copy the bits to the log - does
this sound right?

Further I heard about a SW based DPT within the vDPA framework for those
devices that do not (yet) support DPT inherently in HW. How is this envisioned
to work?
What I am aware of is simply switching to a software virtio
for the duration of migration. The software can be pretty simple
since the formats match: just copy available entries to device ring,
and for used entries, see a used ring entry, mark page
dirty and then copy used entry to guest ring.

That looks more heavyweight than e.g just relay used ring (as what dpdk did)
I believe?
That works for used but not for the packed ring.


For packed ring, we can relay the descriptor ring?




Another approach that I proposed and was prototyped at some point by
Alex Duyck is guest driver touching the page in question before
processing it within guest e.g. by an atomic xor with 0.
Sounds attractive but didn't perform all that well.

Intel posted i40e software solution that traps queue tail/head write. But
I'm not sure it's good enough.

https://lore.kernel.org/kvm/20191206082232.GH31791@joy-OptiPlex-7040/

DMA unmap time seems more generic to me. But again I suspect
the main issue is the same - it's handled on the data path
blocking packet RX until dirty tracking is handled.

Hardware solutions by comparison queue writes and make
progress, dirty page is handled by the migration CPU.



Finally, for those HW vendors that do support DPT in HW, a mapping of a bit ->
page isn't really an option, since no one wants to do a byte wide
read-modify-write across the PCI bus, but rather map a whole byte to page is
likely more desirable - the HW can just do non-posted writes to the dirty page
table. If byte wise, then the QEMU/vDPA layer has to either fix-up the mapping
(from byte->bit) or have the capability to handle the granularity diffs.

Thoughts?

Rob Miller
rob.miller@broadcom.com
(919)721-3339
If using an IOMMU, DPT can also be done using either PRI or dirty bit in
a PTE. PRI is an interrupt so it can kick off a thread to set bits in
the log I guess, but if it's the dirty bit then I don't think there's an
interrupt. And a polling thread does not sound attractive.  I guess
we'll need a new interface to notify VDPA that QEMU is looking for dirty
logs, and then VDPA can send them to QEMU in some way.  Will probably be
good enough to support vendor specific logging interfaces, too.  I don't
actually have hardware which supports either so actually coding it up is
not yet practical.

Yes, both PRI and PTE dirty bit requires special hardware support. We can
extend vDPA API to support both. For page fault, probably just a IOMMU page
fault handler.


Further, at my KVM forum presentaiton I proposed a virtio-specific
pagefault handling interface.  If there's a wish to standardize and
implement that, let me know and I will try to write this up in a more
formal way.

Besides pagefault, if we want virito to be more like vhost, we need also
formalize the device state feching. E.g per vq index etc.

Thanks
Yes that would clearly be in-scope for the spec.   I would not start
with a guest/host interface even.  I would start by just listing what
the state that needs to be migrated is, for each device. And it would
also be useful to list, for each device, how to make two devices
compatible migration wise.  We can do that in a non-normative section.
Again the big blocker here is lack of manpower.


Yes.

Thanks





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]