OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: RFC: virtio-hostmem (+ Continuation of discussion from [virtio-dev] Memory sharing device)


Missed this:

 > \item Large bidirectional transfers are possible without scatterlists, because
> Â Â the memory is always physically contiguous.

Michael: It might get fragmented though. I think it would be up to
host to try and make sure it's not too fragmented, right?

Response:

Yes, it's definitely possible to get fragmented, and it is up to the host to make sure it's not too fragmented in the end.

This can be more likely achieved if the device is typically used in such a way that allocations on host tend to be page granularity, and live for a long time. For the use cases such as codec and graphics, allocations will tend to be over a page and have that long lifetime discussed (well, at least if we consider lifetime of a guest process using the device "long" enough)


On Sun, Feb 24, 2019 at 1:18 PM Frank Yang <lfy@google.com> wrote:
virtio-hostmem is a proposed way to share host memory to the guest and communicate notifications. One potential use case is to have userspace drivers for virtual machines.

The latest version of the spec proposal can be found at


The revision history so far:

https://github.com/741g/virtio-spec/commit/206b9386d76f2ce18000dfc2b218375e423ac8e0 - renamed to virtio-hostmem and removed dependence on host callbacks

This first RFC email includes replies to comments from mst@redhat.com:

 > \item Guest allocates into the PCI region via config virtqueue messages.

Michael: OK so who allocates memory out of the PCI region?Â
Response:

Allocation will be split by guest address space versus host address space.

Guest address space: The guest driver determines the offset into the BAR in which to allocate the new region. The implementation of the allocator itself may live on the host (while guest triggers such allocations via the config virtqueue messages), but the ownership of region offsets and sizes will be in the guest. This allows for the easy use of existing guest ref-counting mechanisms such as last close() calling release() to clean up the memory regions in the guest.

Host address space: The backing of such memory regions is considered completely optional. The host may service a guest region with a memory of its choice that depends on the usage of the device. The time this servicing happens may be any time after the guest communicates the message to create a memory region, but before the guest destroys the memory region. In the meantime, some examples of how the host may respond to the allocation request:
  • The host does not back the region at all and a page fault happens.
  • The host has already allocated host RAM (from some source; vkMapMemory, malloc(), mmap, etc) memory of some kind and maps a page-aligned host pointer to the guest physical address corresponding to the region.
  • The host has already set up a MMIO region (such as via the MemoryRegion API in QEMU) and maps that MMIO region to the guest physical address, allowing for MMIO callbacks to happen on read/writes to that memory region.
 > \item Guest: After a packet of compressed video stream is downloaded to the
> Â Â buffer, another message, like a doorbell, is sent on the ping virtqueue to
> Â Â Â Â consume existing compressed data. The ping message's offset field is
> Â Â Â Â set to the proper offset into the shared-mem object.

Michael: BTW is this terminology e.g. "download", "ping message" standard somewhere?
Response:

Conceptually, it has a lot in common with "virtqueue notification" or "doorbell register". We should resolve to a more standard terminology; what about "notification"?

 > \item Large bidirectional transfers are possible with zero copy.

Michael: However just to make sure, sending small amounts of data
is slower since you get to do all the mmap dance.

Reponse: Yes it will be very slow if the user chooses to perform mmap for each transfer. However, we expect that for users who want to perform frequent transfers of small amounts of data, such as for the sensors / codec use cases, that the mmap happens once on instance creation with a single message to create a memory region, and then every time a transfer hapens, only the notification message is needed, while the existing mmap'ed region is reused. We expect the regions to remain fairly stable over the use of the instance, in most cases; the guest userspace will also mmap() once to get direct access to the host memory, then reuse it many times while sending traffic.

 > \item It is not necessary to use socket datagrams or data streams to
> Â Â communicate the ping messages; they can be raw structs fresh off the
> Â Â Â Â virtqueue.

Michael: OK and ping messages are all fixed size?Â
Response:

Yes, all ping messages are fixed size.Â

Michael: OK I still owe you that write-up about vhost pci. will try to complete
that early next week. But generally if I got it right that the host
allocates buffers then what you describe does seem to fit a bit better
with the vhost pci host/guest interface idea.

One question that was asked about vhost pci is whether it is in fact
necessary to share a device between multiple applications.
Or is it enough to just have one id per device?ÂÂ

Response:
Yes, looking forward! I'm kind of getting some rough idea now of what you may be referring to with vhost pci, perhaps if we can use a shared memory channel like Chrome to drive vhost or something. I'll wait for the full response before designing more into this area though :)

For now, it's not necessary to share the device between multiple VMs, but it is necessary to share between multiple guest processes, so multiple instance ids need to be supported for each device id.

It is also possible to share one instance id across guest processes as well. In the codec example, the codec may run in a separate guest process from the guest process that consumes the data, so to prevent copies, ideally both would have a view of the same host memory. Similar things shows up when running Vulkan or gralloc/dmabuf-like mechanisms; in recent versions of gralloc for example, one process allocates the memory while other processes share that memory by mapping it directly.Â

However those are between guest processes. For inter-VM communication, I am still a bit tentative on this but it shows that instance id's fundamentally reflect a host-side context and its resources. Two VMs could map the same host memory in principle (though I have not tried it with KVM, I'm not sure if things explode if set user memory region happens for the same host memory across two VMs), and if it makes sense for them to communicate over that memory, then it makes sense for the instance id to be shared across the two VMs as well.

Anyway, thanks for the feedback!

Best,

Frank


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]