[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: VM memory protection and zero-copy transfers.
Hello everyone, The traditional virtio model relies on the ability for the host to access the entire memory of the guest VM. Virtio is also used in system configurations where the devices are not featured by the host (which may not exist as such in the case of a Type-1 hypervisor) but by another, unprivileged guest VM. In such a configuration, the guest VM memory sharing requirement would raise security concerns. The following proposal removes that requirement by introducing an alternative model where the interactions between the virtio driver and the virtio device are mediated by the hypervisor. This concept is applicable to both Type-1 and Type-2 hypervisors. In the following write-up, the "host" thus refers either to the host OS or to the guest VM that executes the virtio device. The main objective is to keep the memory of the VM that runs the driver isolated from the memory that runs the device, while still allowing zero-copy transfers between the two domains. The operations that control the exchange of the virtio buffers are handled by hypervisor code that sits between the device and the driver. As opposed to the regular virtio model, the virtqueues allocated by the driver are not shared with the device directly. Instead, the hypervisor allocates a separate set of virtqueues that have the same sizes as the original ones and shares this second set with the device. These hypervisor-allocated virtqueues are referred as the "shadow virtqueues". During device operation, the hypervisor copies the descriptors between the driver and the shadow virtqueues as the buffers cycle between the driver and the device. Whenever the driver adds some buffers to the available ring, the hypervisor validates the descriptors and dynamically grants the I/O buffers to the host or VM that runs the device. The hypervisor then copies these descriptors to the shadow virtqueue's available ring. At the other end, when the device returns buffers to the shadow virtqueue's used ring, the hypervisor unmaps these buffers from the host's address space and copy the descriptor to the driver's used ring. Although the virtio buffers can be allocated anywhere in the guest memory and are not necessarily page-aligned, the memory sharing granularity is constrained by the page size. So when a buffer is mapped to the host address space, the hypervisor may end up sharing more memory that what is strictly needed. The cost of granting the memory dynamically as virtio transfers goes is significant, though. We measured up to 40% performance degradation when using this dynamic buffer granting mechanism. We also compared this solution to other approaches that we have seen elsewhere. For instance, using the swiotlb mechanism along with the VIRTIO_F_ACCESS_PLATFORM feature bit to force a copy of the I/O buffers to a statically shared memory region. In that case, the same set of benchmarks shows an even bigger performance degradation, up to 60%, compared to the original virtio performance. Although the shadow virtqueue concept looks fairly simple, there is still one point that has not been covered yet: indirect descriptors. To support indirect descriptors, the following two options were considered initially: 1. Grant the indirect descriptor as-is to the host while it is on the used ring. This introduces a security issue because a compromised guest OS can modify the indirect descriptor after it has been pushed to the available ring. This would cause the device to fault while trying to access any arbitrary memory that was not actually granted. Note that in the shadow virtqueue model, there is no need for the device to validate the descriptors in the available rings, because the hypervisor already performed such checks before granting the memory. 2. Follow the same logic that is used for the "normal" descriptors and introduce shadow indirect descriptors. This would require the hypervisor to provision a memory pool to allocate these shadow indirect descriptors and determining the size of this pool may not be trivial. Additionally, indirect descriptors can be as large as the driver wants them to be, something that can cause the hypervisor to copy an arbitrary large amount of data. An alternative approach consists in introducing a new virtio feature bit. This feature bit, when set by the device, instructs the driver to allocate indirect descriptors using dedicated memory pages. These pages shall hold no other data than the indirect descriptors. Since a correct virtio driver implementation does not modify an indirect descriptor once it has been pushed to the device, the pages where the indirect descriptors lies can later on be remapped as read-only in the guest address space. This allows the hypervisor to validate the content of the indirect descriptor, grant it to the host (along with all the buffers referenced by this descriptor) and remap the indirect descriptor read-only in the guest address space as long as it is granted to the host (i.e. until the indirect descriptor is returned through the used ring). The present proposal has some obvious drawbacks, but we believe that memory protection will not come for free. We know that there are other folks out there which try to address this issue of memory sharing between VMs, so we would be pleased to hear what you guys think about this approach. Additionally, we would like to know whether a feature bit similar to the one that was discussed here could be considered for addition to the virtio standard. Looking forwards to hear from you. Baptiste
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]