RFC v2: virtio-hostmem: static, guest-owned memory regions

+Christopher Dall who has tried to standardize goldfish before.

Link: https://github.com/741g/virtio-spec/blob/67602f232386a1782a35b9cb41087586ac3d19e2/virtio-hostmem.tex

This is the next version of virtio-hostmem that makes the following revisions:

A. Guest ownership of host memoryÂ+ guaranteed backing.

This is accomplished by having a static set of sub-device-specific physical regions that are realized once

on guest bootup, even if the hostmem device is not plugged in.

- The host is assumed to always back those regions in some way, even if it is just mapping the same page over and over to save memory.

- The host may change mappings at runtime in a driver/device implementation-specific way.

- Do not allow alloc/free of physical ranges dynamically. Setup is once and permanent even if PCI device is not plugged.

- Security model is pushed to the guest-specific layers like selinux; it is possible (and this is useful) for a physical page to be shared across guest processes, and it is up to the guest's current security model to enforce malicious apps not having access.

- The physical ranges are configured on startup to also never overlap or allow sharing between two sub-device types.

In summary this greatly simplifies allocation of physical memory while retaining flexibility via sub-allocation into those regions from the sub-device-specific implementation.

It also fits much better the actual use cases where a large arena is allocated up front once;

the new spec simply makes this required.

More specifically, "Guest ownership of host memoryÂ+ guaranteed backing" addresses the following comments in the following ways:

From Roman Kiryanov:

"Should we note here what happens if a guest releases (a user process

dies) the region without asking the host to un-back it?"

Nothing will happen as we will not need to ask the host to un-back anything;

we always guarantee the physical region is accessible.

The physical region is also suballocated across guest user processes

according to the driver implementation.

From Dr. David Alan Gilbert:

"Note that a mapping missing on the host wont necessarily turn into a
page fault in the guest; on qemu for example, if you have a memory
region like this where the guest accesses an area with no mapping, I
think we hit a kvm error."

With up-front pre-backed host memory, this is not a concern anymore;

there will be no page faults or kvm errors.

Also, I'm thinking about how this proposal can better fit with your current

host memory regions scheme. Ideally, I'd like to propose this device

in terms of your current proposal, with as much precise, shared definitions as possible.

From Michael S. Tsirkin:

"VirtioÂuses the terms "available buffer notification" and
"used buffer notification". If this follows vhost-pci design
tnen available buffer notification is sent host to guest,
and used buffer notification is sent guest to host.
VirtioÂis the reverse."

The spec adds languages whereÂ

the memory transaction associated with each notification message

is considered completed by the guest on available,

and completed by the host on used.

"Especially in this case, this needs some security model enforced by

guest kernel."

We will rely on SELinux in the Linux case.

"Right. Details on how memory is allocated in the proposed scheme are

scant but above I think shows that it can't all be up to guest."

By moving all the memory regions to be initialized once on startup and

enforcing a guarantee that the physical regions are usable as RAM,

the host must back them in some way.

"Right. Details on how memory is allocated in the proposed scheme are

scant but above I think shows that it can't all be up to guest."

We effectively now have an invariant that the host must service all such physical ranges as ram.

It can lazily allocate/change the allocation, but there should not be any situation where

the ranges are unmapped.

B. Redesigned virtqueue layout to have one virtqueue per instance

and renamed to "notification" virtqueue.

Also, added definitions of what are the guarantees about memory transactions

named in each virtqueue upon available and used.

Addresses comment:

From Michael S. Tsirkin:

The spec adds language whereÂ

the memory transaction associated with each notification message

is considered completed by the guest on available,

and completed by the host on used.

Not addressed yet:

"This might work. Note that host page size might be different.

If it's bigger host needs to be careful about allocating
full host pages anyway."

This is not addressed yet due to the need to tackle more fundamental issues first,

but I imagine we should be using the lowest common denominator of guest/host page size

to ensure safe access, and any pre-existing device configuration would round up to that.

Frank

virtio-comment message