Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode devi

Subject: Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification

On Thu, Oct 17, 2019 at 8:00 AM Frank Yang <lfy@google.com> wrote:

On Wed, Oct 16, 2019 at 11:58 PM Tomasz Figa <tfiga@chromium.org> wrote:
On Mon, Oct 14, 2019 at 9:19 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> > > Well.Â I think before even discussing the protocol details we need a
> > > reasonable plan for buffer handling.Â I think using virtio-gpu buffers
> > > should be an optional optimization and not a requirement.Â Also the
> > > motivation for that should be clear (Let the host decoder write directly
> > > to virtio-gpu resources, to display video without copying around the
> > > decoded framebuffers from one device to another).
> >
> > Just to make sure we're on the same page, what would the buffers come
> > from if we don't use this optimization?
> >
> > I can imagine a setup like this;
> >Â 1) host device allocates host memory appropriate for usage with host
> > video decoder,
> >Â 2) guest driver allocates arbitrary guest pages for storage
> > accessible to the guest software,
> >Â 3) guest userspace writes input for the decoder to guest pages,
> >Â 4) guest driver passes the list of pages for the input and output
> > buffers to the host device
> >Â 5) host device copies data from input guest pages to host buffer
> >Â 6) host device runs the decoding
> >Â 7) host device copies decoded frame to output guest pages
> >Â 8) guest userspace can access decoded frame from those pages; back to 3
> >
> > Is that something you have in mind?
>
> I don't have any specific workflow in mind.
>
> If you want display the decoded video frames you want use dma-bufs shared
> by video decoder and gpu, right?Â So the userspace application (video
> player probably) would create the buffers using one of the drivers,
> export them as dma-buf, then import them into the other driver.Â Just
> like you would do on physical hardware.Â So, when using virtio-gpu
> buffers:
>
>Â Â(1) guest app creates buffers using virtio-gpu.
>Â Â(2) guest app exports virtio-gpu buffers buffers as dma-buf.
>Â Â(3) guest app imports the dma-bufs into virtio-vdec.
>Â Â(4) guest app asks the virtio-vdec driver to write the decoded
>Â Â Â Âframes into the dma-bufs.
>Â Â(5) guest app asks the virtio-gpu driver to display the decoded
>Â Â Â Âframe.
>
> The guest video decoder driver passes the dma-buf pages to the host, and
> it is the host driver's job to fill the buffer.Â How this is done
> exactly might depend on hardware capabilities (whenever a host-allocated
> bounce buffer is needed or whenever the hardware can decode directly to
> the dma-buf passed by the guest driver) and is an implementation detail.
>
> Now, with cross-device sharing added the virtio-gpu would attach some
> kind of identifier to the dma-buf, virtio-vdec could fetch the
> identifier and pass it to the host too, and the host virtio-vdec device
> can use the identifier to get a host dma-buf handle for the (virtio-gpu)
> buffer.Â Ask the host video decoder driver to import the host dma-buf.
> If it all worked fine it can ask the host hardware to decode directly to
> the host virtio-gpu resource.
>

Agreed.

> > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > virtio-gpu resource handles.Â The handles are device-specific.Â What if
> > > there are multiple virtio-gpu devices present in the guest?
> > >
> > > I think we need a framework for cross-device buffer sharing.Â One
> > > possible option would be to have some kind of buffer registry, where
> > > buffers can be registered for cross-device sharing and get a unique
> > > id (a uuid maybe?).Â Drivers would typically register buffers on
> > > dma-buf export.
> >
> > This approach could possibly let us handle this transparently to
> > importers, which would work for guest kernel subsystems that rely on
> > the ability to handle buffers like native memory (e.g. having a
> > sgtable or DMA address) for them.
> >
> > How about allocating guest physical addresses for memory corresponding
> > to those buffers? On the virtio-gpu example, that could work like
> > this:
> >Â - by default a virtio-gpu buffer has only a resource handle,
> >Â - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > virtio-gpu device export the buffer to a host framework (inside the
> > VMM) that would allocate guest page addresses for it, which the
> > command would return in a response to the guest,
>
> Hmm, the cross-device buffer sharing framework I have in mind would
> basically be a buffer registry.Â virtio-gpu would create buffers as
> usual, create a identifier somehow (details to be hashed out), attach
> the identifier to the dma-buf so it can be used as outlined above.
>
> Also note that the guest manages the address space, so the host can't
> simply allocate guest page addresses.

Is this really true? I'm not an expert in this area, but on a bare
metal system it's the hardware or firmware that sets up the various
physical address allocations on a hardware level and most of the time
most of the addresses are already pre-assigned in hardware (like the
DRAM base, various IOMEM spaces, etc.).

I think that means that we could have a reserved region that could be
used by the host for dynamic memory hot-plug-like operation. The
reference to memory hot-plug here is fully intentional, we could even
use this feature of Linux to get struct pages for such memory if we
really wanted.

Wouldn't normal hotplug result in Linux trying to claim what you just hot plugged?
Â
> Mapping host virtio-gpu resources
> into guest address space is planned, it'll most likely use a pci memory
> bar to reserve some address space.Â The host can map resources into that
> pci bar, on guest request.

Sounds like a viable option too. Do you have a pointer to some
description on how this would work on both host and guest side?

I've been following the conv. and I'd like to have a more in depth discussion sometime. On the Android Emulator, we've successfully been mapping host memory into the PCI bar for graphics applications where host visible/coherent memory is needed (Vulkan) or has benefits for performance/compatibility (gralloc with CPU usage, as a substitute for ashmem).

We use the address space driverÂ+Âdevice, which through the PING ioctl allows notifications from guest to host and allows all the implementation to be in VMM or guest userspace. I had a previous spec (virtio-hostmem) on making a version of the addr. space device for virtio, but we got sidetracked in various ways. One technical point from there was deciding on whether the host needed to allocate everything up front. Wonder if now that it's acceptable to map host memory on guest request into that, we can revisit the virtio-hostmem idea.
Â

In terms of video decode specifically, we're able to allocate host memory for the video codec, then map it to the guest via the PCI bar, and then the guest ends up using the decoded frame in graphics. But, once it does so, and the guest wants to upload the deocdedÂframe as a texture or something, on the VMM side we'd already be aware of the physical address (host virtual address) where the decoded frame lives, regardless of how many layers and other abstractions the decoded frame travels through (assuming it doesn't get copied, but if it does, then we properly emulate the fact).

The situation is probably a bit more complex for video decode that is done on hardware without copying to cpu memory. We haven't really addressed that yet. I imagine on the VMM side, video codec decodes in hardware to some image or texture that is associated with a dmabuf, or a native_handle_t, or an OpenGL texture. But then the guest userspace shouldn't really care that they are related on the VMM side as long as the VMM has a consistent interface with whatever shared image kernel abstr / library (layers above virtio) the guest uses?

>
> >Â - virtio-gpu driver could then create a regular DMA-buf object for
> > such memory, because it's just backed by pages (even though they may
> > not be accessible to the guest; just like in the case of TrustZone
> > memory protection on bare metal systems),
>
> Hmm, well, pci memory bars are *not* backed by pages.Â Maybe we can use
> Documentation/driver-api/pci/p2pdma.rst though.Â With that we might be
> able to lookup buffers using device and dma address, without explicitly
> creating some identifier.Â Not investigated yet in detail.

Not backed by pages as in "struct page", but those are still regular
pages of the physical address space. That said, currently the sg_table
interface is only able to describe physical memory using struct page
pointers. It's been a long standing limitation affecting even bare
metal systems, so perhaps it's just the right time to make them
possible to use some other identifiers, like PFNs? There is at least
one more free bit in the page_link field, which could mean that the
field contains a PFN instead of a struct page pointer.

Best regards,
Tomasz

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

virtio-dev message