virtio-dev message

Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Alex Bennée <alex.bennee@linaro.org>
Date: Thu, 5 Aug 2021 16:48:35 +0100
On Wed, Aug 04, 2021 at 10:04:30AM +0100, Alex Bennée wrote:
> Hi,
> 
> One of the goals of Project Stratos is to enable hypervisor agnostic
> backends so we can enable as much re-use of code as possible and avoid
> repeating ourselves. This is the flip side of the front end where
> multiple front-end implementations are required - one per OS, assuming
> you don't just want Linux guests. The resultant guests are trivially
> movable between hypervisors modulo any abstracted paravirt type
> interfaces.
> 
> In my original thumb nail sketch of a solution I envisioned vhost-user
> daemons running in a broadly POSIX like environment. The interface to
> the daemon is fairly simple requiring only some mapped memory and some
> sort of signalling for events (on Linux this is eventfd). The idea was a
> stub binary would be responsible for any hypervisor specific setup and
> then launch a common binary to deal with the actual virtqueue requests
> themselves.
> 
> Since that original sketch we've seen an expansion in the sort of ways
> backends could be created. There is interest in encapsulating backends
> in RTOSes or unikernels for solutions like SCMI. There interest in Rust
> has prompted ideas of using the trait interface to abstract differences
> away as well as the idea of bare-metal Rust backends.
> 
> We have a card (STR-12) called "Hypercall Standardisation" which
> calls for a description of the APIs needed from the hypervisor side to
> support VirtIO guests and their backends. However we are some way off
> from that at the moment as I think we need to at least demonstrate one
> portable backend before we start codifying requirements. To that end I
> want to think about what we need for a backend to function.
> 
> Configuration
> =============
> 
> In the type-2 setup this is typically fairly simple because the host
> system can orchestrate the various modules that make up the complete
> system. In the type-1 case (or even type-2 with delegated service VMs)
> we need some sort of mechanism to inform the backend VM about key
> details about the system:
> 
>   - where virt queue memory is in it's address space
>   - how it's going to receive (interrupt) and trigger (kick) events
>   - what (if any) resources the backend needs to connect to
> 
> Obviously you can elide over configuration issues by having static
> configurations and baking the assumptions into your guest images however
> this isn't scalable in the long term. The obvious solution seems to be
> extending a subset of Device Tree data to user space but perhaps there
> are other approaches?
> 
> Before any virtio transactions can take place the appropriate memory
> mappings need to be made between the FE guest and the BE guest.
> Currently the whole of the FE guests address space needs to be visible
> to whatever is serving the virtio requests. I can envision 3 approaches:
> 
>  * BE guest boots with memory already mapped
> 
>  This would entail the guest OS knowing where in it's Guest Physical
>  Address space is already taken up and avoiding clashing. I would assume
>  in this case you would want a standard interface to userspace to then
>  make that address space visible to the backend daemon.
> 
>  * BE guests boots with a hypervisor handle to memory
> 
>  The BE guest is then free to map the FE's memory to where it wants in
>  the BE's guest physical address space. To activate the mapping will
>  require some sort of hypercall to the hypervisor. I can see two options
>  at this point:
> 
>   - expose the handle to userspace for daemon/helper to trigger the
>     mapping via existing hypercall interfaces. If using a helper you
>     would have a hypervisor specific one to avoid the daemon having to
>     care too much about the details or push that complexity into a
>     compile time option for the daemon which would result in different
>     binaries although a common source base.
> 
>   - expose a new kernel ABI to abstract the hypercall differences away
>     in the guest kernel. In this case the userspace would essentially
>     ask for an abstract "map guest N memory to userspace ptr" and let
>     the kernel deal with the different hypercall interfaces. This of
>     course assumes the majority of BE guests would be Linux kernels and
>     leaves the bare-metal/unikernel approaches to their own devices.

VIRTIO typically uses the vring memory layout but doesn't need to. The
VIRTIO device model deals with virtqueues. The shared memory vring
layout is part of the VIRTIO transport (PCI, MMIO, and CCW use vrings).
Alternative transports with other virtqueue representations are possible
(e.g. VIRTIO-over-TCP). They don't need to involve a BE mapping shared
memory and processing a vring owned by the FE.

For example, there could be BE hypercalls to pop a virtqueue elements,
push a virtqueue elements, and to access buffers (basically DMA
read/write). The FE could either be a traditional virtio-mmio/pci device
with a vring or use FE hypercalls to add available elements to a
virtqueue and get used elements.

I don't know the goals of project Stratos or whether this helps, but it
might allow other architectures that have different security,
complexity, etc properties.

Stefan
Attachment: signature.asc
Description: PGP signature
References:
- Enabling hypervisor agnosticism for VirtIO backends
  - From: Alex BennÃe <alex.bennee@linaro.org>