virtio-dev message

Subject: Re: [virtio-dev] Next VirtIO device for Project Stratos?

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Date: Wed, 7 Sep 2022 15:09:27 +0100

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Tue, Sep 06, 2022 at 06:33:36PM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > On Sat, Sep 03, 2022 at 07:43:08AM +0000, Alyssa Ross wrote:
> > > > Hi Alex and everyone else, just catching up on some mail and wanted to
> > > > clarify some things:
> > > > 
> > > > Alex Bennée <alex.bennee@linaro.org> writes:
> > > > 
> > > > > This email is driven by a brain storming session at a recent sprint
> > > > > where we considered what VirtIO devices we should look at implementing
> > > > > next. I ended up going through all the assigned device IDs hunting for
> > > > > missing spec discussion and existing drivers so I'd welcome feedback
> > > > > from anybody actively using them - especially as my suppositions about
> > > > > device types I'm not familiar with may be way off!
> > > > >
> > > > > [...snip...]
> > > > >
> > > > > GPU device / 16
> > > > > ---------------
> > > > >
> > > > > This is now a fairly mature part of the spec and has implementations is
> > > > > the kernel, QEMU and a vhost-user backend. However as is commensurate
> > > > > with the complexity of GPUs there is ongoing development moving from the
> > > > > VirGL OpenGL encapsulation to a thing called GFXSTREAM which is meant to
> > > > > make some things easier.
> > > > >
> > > > > A potential area of interest here is working out what the differences
> > > > > are in use cases between virtio-gpu and virtio-wayland. virtio-wayland
> > > > > is currently a ChromeOS only invention so hasn't seen any upstreaming or
> > > > > specification work but may make more sense where multiple VMs are
> > > > > drawing only elements of a final display which is composited by a master
> > > > > program. For further reading see Alyssa's write-up:
> > > > >
> > > > >   https://alyssa.is/using-virtio-wl/
> > > > >
> > > > > I'm not sure how widely used the existing vhost-user backend is for
> > > > > virtio-gpu but it could present an opportunity for a more beefy rust-vmm
> > > > > backend implementation?
> > > > 
> > > > As I understand it, virtio-wayland is effectively deprecated in favour
> > > > of sending Wayland messages over cross-domain virtio-gpu contexts.  It's
> > > > possible to do this now with an upstream kernel, whereas virtio-wayland
> > > > always required a custom driver in the Chromium kernel.
> > > > 
> > > > But crosvm is still the only implementation of a virtio-gpu device that
> > > > supports Wayland over cross-domain contexts, so it would be great to see
> > > > a more generic implementation.  Especially because, while crosvm can
> > > > share its virtio-gpu device over vhost-user, it does so in a way that's
> > > > incompatible with the standardised vhost-user-gpu as implemented by
> > > > QEMU.  When I asked the crosvm developers in their Matrix channel what
> > > > it would take to use the standard vhost-user-gpu variant, they said that
> > > > the standard variant was lacking functionality they needed, like mapping
> > > > and unmapping GPU buffers into the guest.
> > > 
> > > That sounds somewhat similar to virtiofs and its DAX Window, which needs
> > > vhost-user protocol extensions because of how memory is handled. David
> > > Gilbert wrote the QEMU virtiofs DAX patches, which are under
> > > development.
> > > 
> > > I took a quick look at the virtio-gpu specs. If the crosvm behavior you
> > > mentioned is covered in the VIRTIO spec then I guess it's the "host
> > > visible memory region"?
> > > 
> > > (If it's not in the VIRTIO spec then a spec change needs to be proposed
> > > first and a vhost-user protocol spec change can then support that new
> > > virtio-gpu feature.)
> > > 
> > > The VIRTIO_GPU_CMD_RESOURCE_MAP_BLOB command maps the device's resource
> > > into the host visible memory region so that the driver can see it.
> > > 
> > > The virtiofs DAX window uses vhost-user slave channel messages to
> > > provide file descriptors and offsets for QEMU to mmap. QEMU mmaps the
> > > file pages into the shared memory region seen by the guest driver.
> > > 
> > > Maybe an equivalent mechanism is needed for virtio-gpu so a device
> > > resource file descriptor can be passed to QEMU and then mmapped so the
> > > guest driver can see the pages?
> > > 
> > > I think it's possible to unify the virtiofs and virtio-gpu extensions to
> > > the vhost-user protocol. Two new slave channel messages are needed: "map
> > > <fd, offset, len> to shared memory resource <n>" and "unmap <offset,
> > >  len> from shared memory resource <n>". Both devices could use these
> > > messages to implement their respective DAX Window and Blob Resource
> > > functionality.
> > 
> > It might be possible; but there's a bunch of lifetime/alignment/etc
> > questions to be answered.
> > 
> > For virtiofs DAX we carve out a chunk of a BAR as a 'cache' (unfortunate
> > name) that we can then do mappings into.
> > 
> > The VHOST_USER_SLAVE_FS_MAP/UNMAP commands can do the mapping:
> > https://gitlab.com/virtio-fs/qemu/-/commit/7c29854da484afd7ca95acbd2e4acfc2c75ef491
> > https://gitlab.com/virtio-fs/qemu/-/commit/f32bc2524035931856aa218ce18efa029b9eed02
> > 
> > those might do what you want if you can figure out a way to generalise
> > the bar to map them into.
> > 
> > There are some problems; KVM gets really really upset if you try and
> > access an area that doesn't have a mapping or is mapped to a truncated
> > file;  do you want the guest to be able to crash like that?
> 
> I think you are pointing out the existing problems with virtiofs
> map/unmap and not new issues related to virtio-gpu or generalizing the
> vhost-user messages?
> 

Right, although what I don't have a feel of here is the semantics of the
things that are being mapped in the GPU case, and what possibility that
the driver mapping them has to pick some bad offset.

Dave

> There are a few possibilities for dealing with unmapped ranges in Shared
> Memory Regions:
> 
> 1. Reserve the unused Shared Memory Region ranges with mmap(PROT_NONE)
>    so that accesses to unmapped pages result in faults.
> 2. Map zero pages that are either:
>    a. read-only
>    b. read-write but discard stores
>    c. private/anonymous memory
> 
> virtiofs does #1 and has trouble with accesses to unmapped areas because
> KVM's MMIO dispatch loop gets upset. On top of that virtiofs also needs
> a way to inject the fault into the guest so that the truncated mmap case
> can be detected in the guest.
> 
> The situation is probably easier for virtio-gpu than for virtiofs. I
> think the underlying host files won't be truncated and guest userspace
> processes cannot access unmapped pages. So virtio-gpu is less
> susceptible to unmapped accesses.
> 
> But we still need to implement unmapped access semantics. I don't know
> enough about CPU memory to suggest a solution for injecting unmapped
> access faults. Maybe you can find someone who can help. I wonder if pmem
> or CXL devices have similar requirements?
> 
> Stefan


-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Follow-Ups:
- Re: [virtio-dev] Next VirtIO device for Project Stratos?
  - From: Stefan Hajnoczi <stefanha@redhat.com>

References:
- Re: [virtio-dev] Next VirtIO device for Project Stratos?
  - From: Alyssa Ross <hi@alyssa.is>
- Re: [virtio-dev] Next VirtIO device for Project Stratos?
  - From: Stefan Hajnoczi <stefanha@redhat.com>
- Re: [virtio-dev] Next VirtIO device for Project Stratos?
  - From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
- Re: [virtio-dev] Next VirtIO device for Project Stratos?
  - From: Stefan Hajnoczi <stefanha@redhat.com>