virtio-comment message

Subject: Re: [virtio-dev] Next VirtIO device for Project Stratos?
From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Date: Tue, 6 Sep 2022 14:12:32 -0400
On Tue, Sep 06, 2022 at 06:33:36PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Sat, Sep 03, 2022 at 07:43:08AM +0000, Alyssa Ross wrote:
> > > Hi Alex and everyone else, just catching up on some mail and wanted to
> > > clarify some things:
> > > 
> > > Alex Bennée <alex.bennee@linaro.org> writes:
> > > 
> > > > This email is driven by a brain storming session at a recent sprint
> > > > where we considered what VirtIO devices we should look at implementing
> > > > next. I ended up going through all the assigned device IDs hunting for
> > > > missing spec discussion and existing drivers so I'd welcome feedback
> > > > from anybody actively using them - especially as my suppositions about
> > > > device types I'm not familiar with may be way off!
> > > >
> > > > [...snip...]
> > > >
> > > > GPU device / 16
> > > > ---------------
> > > >
> > > > This is now a fairly mature part of the spec and has implementations is
> > > > the kernel, QEMU and a vhost-user backend. However as is commensurate
> > > > with the complexity of GPUs there is ongoing development moving from the
> > > > VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to
> > > > make some things easier.
> > > >
> > > > A potential area of interest here is working out what the differences
> > > > are in use cases between virtio-gpu and virtio-wayland. virtio-wayland
> > > > is currently a ChromeOS only invention so hasn't seen any upstreaming or
> > > > specification work but may make more sense where multiple VMs are
> > > > drawing only elements of a final display which is composited by a master
> > > > program. For further reading see Alyssa's write-up:
> > > >
> > > >   https://alyssa.is/using-virtio-wl/
> > > >
> > > > I'm not sure how widely used the existing vhost-user backend is for
> > > > virtio-gpu but it could present an opportunity for a more beefy rust-vmm
> > > > backend implementation?
> > > 
> > > As I understand it, virtio-wayland is effectively deprecated in favour
> > > of sending Wayland messages over cross-domain virtio-gpu contexts.  It's
> > > possible to do this now with an upstream kernel, whereas virtio-wayland
> > > always required a custom driver in the Chromium kernel.
> > > 
> > > But crosvm is still the only implementation of a virtio-gpu device that
> > > supports Wayland over cross-domain contexts, so it would be great to see
> > > a more generic implementation.  Especially because, while crosvm can
> > > share its virtio-gpu device over vhost-user, it does so in a way that's
> > > incompatible with the standardised vhost-user-gpu as implemented by
> > > QEMU.  When I asked the crosvm developers in their Matrix channel what
> > > it would take to use the standard vhost-user-gpu variant, they said that
> > > the standard variant was lacking functionality they needed, like mapping
> > > and unmapping GPU buffers into the guest.
> > 
> > That sounds somewhat similar to virtiofs and its DAX Window, which needs
> > vhost-user protocol extensions because of how memory is handled. David
> > Gilbert wrote the QEMU virtiofs DAX patches, which are under
> > development.
> > 
> > I took a quick look at the virtio-gpu specs. If the crosvm behavior you
> > mentioned is covered in the VIRTIO spec then I guess it's the "host
> > visible memory region"?
> > 
> > (If it's not in the VIRTIO spec then a spec change needs to be proposed
> > first and a vhost-user protocol spec change can then support that new
> > virtio-gpu feature.)
> > 
> > The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB command maps the device's resource
> > into the host visible memory region so that the driver can see it.
> > 
> > The virtiofs DAX window uses vhost-user slave channel messages to
> > provide file descriptors and offsets for QEMU to mmap. QEMU mmaps the
> > file pages into the shared memory region seen by the guest driver.
> > 
> > Maybe an equivalent mechanism is needed for virtio-gpu so a device
> > resource file descriptor can be passed to QEMU and then mmapped so the
> > guest driver can see the pages?
> > 
> > I think it's possible to unify the virtiofs and virtio-gpu extensions to
> > the vhost-user protocol. Two new slave channel messages are needed: "map
> > <fd, offset, len> to shared memory resource <n>" and "unmap <offset,
> >  len> from shared memory resource <n>". Both devices could use these
> > messages to implement their respective DAX Window and Blob Resource
> > functionality.
> 
> It might be possible; but there's a bunch of lifetime/alignment/etc
> questions to be answered.
> 
> For virtiofs DAX we carve out a chunk of a BAR as a 'cache' (unfortunate
> name) that we can then do mappings into.
> 
> The VHOST_USER_SLAVE_FS_MAP/UNMAP commands can do the mapping:
> https://gitlab.com/virtio-fs/qemu/-/commit/7c29854da484afd7ca95acbd2e4acfc2c75ef491
> https://gitlab.com/virtio-fs/qemu/-/commit/f32bc2524035931856aa218ce18efa029b9eed02
> 
> those might do what you want if you can figure out a way to generalise
> the bar to map them into.
> 
> There are some problems; KVM gets really really upset if you try and
> access an area that doesn't have a mapping or is mapped to a truncated
> file;  do you want the guest to be able to crash like that?

I think you are pointing out the existing problems with virtiofs
map/unmap and not new issues related to virtio-gpu or generalizing the
vhost-user messages?

There are a few possibilities for dealing with unmapped ranges in Shared
Memory Regions:

1. Reserve the unused Shared Memory Region ranges with mmap(PROT_NONE)
   so that accesses to unmapped pages result in faults.
2. Map zero pages that are either:
   a. read-only
   b. read-write but discard stores
   c. private/anonymous memory

virtiofs does #1 and has trouble with accesses to unmapped areas because
KVM's MMIO dispatch loop gets upset. On top of that virtiofs also needs
a way to inject the fault into the guest so that the truncated mmap case
can be detected in the guest.

The situation is probably easier for virtio-gpu than for virtiofs. I
think the underlying host files won't be truncated and guest userspace
processes cannot access unmapped pages. So virtio-gpu is less
susceptible to unmapped accesses.

But we still need to implement unmapped access semantics. I don't know
enough about CPU memory to suggest a solution for injecting unmapped
access faults. Maybe you can find someone who can help. I wonder if pmem
or CXL devices have similar requirements?

Stefan
Attachment: signature.asc
Description: PGP signature
Follow-Ups:
- Re: [virtio-dev] Next VirtIO device for Project Stratos?
  - From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
References:
- Re: [virtio-dev] Next VirtIO device for Project Stratos?
  - From: Stefan Hajnoczi <stefanha@redhat.com>
- Re: [virtio-dev] Next VirtIO device for Project Stratos?
  - From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>