virtio-gpu dedicated heap

Hi everyone,

With the current virtio setup, all of guest memory is shared with host devices.Â ThereÂhas beenÂinterest in changing this, to improve isolation of guest memory and increase confidentiality.ÂÂ

The recently introduced restricted DMA mechanism makes excellent progress in this area:

https://patchwork.kernel.org/project/xen-devel/cover/20210624155526.2775863-1-tientzu@chromium.org/ÂÂ

Devices without an IOMMU (traditional virtio devices for example) would allocate from a specially designated region.Â Swiotlb bouncing is done for all DMA transfers.Â This is controlled by the VIRTIO_F_ACCESS_PLATFORM feature bit.

https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/3064198

This mechanism works great for the devices it was designed for, such as virtio-net.Â However, when trying to adapt to it for other devices, there are some limitations.ÂÂ

It would be greatÂto have a dedicated heap for virtio-gpu rather than allocating from guest memory.ÂÂ

We would like to use dma_alloc_noncontiguous on the restricted dma pool, ideally with page-level granularity somehow.Â Continuous buffers are definitely going out of fashion.

There are two considerations when using it with the restricted DMA approach:

1) No bouncing (aka memcpy)

Expensive with graphics buffers, since guestÂuser space would designate shareable graphics buffers with the host.Â We plan to use DMA_ATTR_SKIP_CPU_SYNC when doing any DMA transactions with GPU buffers.

Bounce buffering will be utilized with virtio-cmds, like the other virtio devices that use the restricted DMA mechanism.

2) IO_TLB_SEGSIZE is too small for graphics buffers

This issue was hit before here too:

https://www.spinics.net/lists/kernel/msg4154086.html

The suggestion was to use shared-dma-pool rather than restricted DMA.Â But we're not sure a single device can have restricted DMA (for VIRTIO_F_ACCESS_PLATFORM) and shared-dma-pool (for larger buffers) at the same time.Â Does anyone know?Â

If not, it sounds like "splitting the allocation into dma_max_mapping_size() chunks" for restricted-dma is also possible.Â What is the preferred method?

More generally, we would love more feedback on the proposed design or consider alternatives!

virtio-dev message