OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] Re: Constraining where a guest may allocate virtio accessible resources


On 18.06.20 16:52, Michael S. Tsirkin wrote:
> On Thu, Jun 18, 2020 at 03:59:54PM +0200, Jan Kiszka wrote:
>> On 18.06.20 15:29, Stefan Hajnoczi wrote:
>>> On Wed, Jun 17, 2020 at 08:01:14PM +0200, Jan Kiszka wrote:
>>>> On 17.06.20 19:31, Alex BennÃÂe wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> This follows on from the discussion in the last thread I raised:
>>>>>
>>>>>   Subject: Backend libraries for VirtIO device emulation
>>>>>   Date: Fri, 06 Mar 2020 18:33:57 +0000
>>>>>   Message-ID: <874kv15o4q.fsf@linaro.org>
>>>>>
>>>>> To support the concept of a VirtIO backend having limited visibility of
>>>>> a guests memory space there needs to be some mechanism to limit the
>>>>> where that guest may place things. A simple VirtIO device can be
>>>>> expressed purely in virt resources, for example:
>>>>>
>>>>>    * status, feature and config fields
>>>>>    * notification/doorbell
>>>>>    * one or more virtqueues
>>>>>
>>>>> Using a PCI backend the location of everything but the virtqueues it
>>>>> controlled by the mapping of the PCI device so something that is
>>>>> controllable by the host/hypervisor. However the guest is free to
>>>>> allocate the virtqueues anywhere in the virtual address space of system
>>>>> RAM.
>>>>>
>>>>> In theory this shouldn't matter because sharing virtual pages is just a
>>>>> matter of putting the appropriate translations in place. However there
>>>>> are multiple ways the host and guest may interact:
>>>>>
>>>>> * QEMU TCG
>>>>>
>>>>> QEMU sees a block of system memory in it's virtual address space that
>>>>> has a one to one mapping with the guests physical address space. If QEMU
>>>>> want to share a subset of that address space it can only realistically
>>>>> do it for a contiguous region of it's address space which implies the
>>>>> guest must use a contiguous region of it's physical address space.
>>>>>
>>>>> * QEMU KVM
>>>>>
>>>>> The situation here is broadly the same - although both QEMU and the
>>>>> guest are seeing a their own virtual views of a linear address space
>>>>> which may well actually be a fragmented set of physical pages on the
>>>>> host.
>>>>>
>>>>> KVM based guests have additional constraints if they ever want to access
>>>>> real hardware in the host as you need to ensure any address accessed by
>>>>> the guest can be eventually translated into an address that can
>>>>> physically access the bus which a device in one (for device
>>>>> pass-through). The area also has to be DMA coherent so updates from a
>>>>> bus are reliably visible to software accessing the same address space.
>>>>>
>>>>> * Xen (and other type-1's?)
>>>>>
>>>>> Here the situation is a little different because the guest explicitly
>>>>> makes it's pages visible to other domains by way of grant tables. The
>>>>> guest is still free to use whatever parts of its address space it wishes
>>>>> to. Other domains then request access to those pages via the hypervisor.
>>>>>
>>>>> In theory the requester is free to map the granted pages anywhere in
>>>>> its own address space. However there are differences between the
>>>>> architectures on how well this is supported.
>>>>>
>>>>> So I think this makes a case for having a mechanism by which the guest
>>>>> can restrict it's allocation to a specific area of the guest physical
>>>>> address space. The question is then what is the best way to inform the
>>>>> guest kernel of the limitation?
>>>>>
>>>>> Option 1 - Kernel Command Line
>>>>> ==============================
>>>>>
>>>>> This isn't without precedent - the kernel supports options like "memmap"
>>>>> which can with the appropriate amount of crafting be used to carve out
>>>>> sections of bad ram from the physical address space. Other formulations
>>>>> can be used to mark specific areas of the address space as particular
>>>>> types of memory.  
>>>>>
>>>>> However there are cons to this approach as it then becomes a job for
>>>>> whatever builds the VMM command lines to ensure the both the backend and
>>>>> the kernel know where things are. It is also very Linux centric and
>>>>> doesn't solve the problem for other guest OSes. Considering the rest of
>>>>> VirtIO can be made discover-able this seems like it would be a backward
>>>>> step.
>>>>>
>>>>> Option 2 - Additional Platform Data
>>>>> ===================================
>>>>>
>>>>> This would be extending using something like device tree or ACPI tables
>>>>> which could define regions of memory that would inform the low level
>>>>> memory allocation routines where they could allocate from. There is
>>>>> already of the concept of "dma-ranges" in device tree which can be a
>>>>> per-device property which defines the region of space that is DMA
>>>>> coherent for a device.
>>>>>
>>>>> There is the question of how you tie regions declared here with the
>>>>> eventual instantiating of the VirtIO devices?
>>>>>
>>>>> For a fully distributed set of backends (one backend per device per
>>>>> worker VM) you would need several different regions. Would each region
>>>>> be tied to each device or just a set of areas the guest would allocate
>>>>> from in sequence?
>>>>>
>>>>> Option 3 - Abusing PCI Regions
>>>>> ==============================
>>>>>
>>>>> One of the reasons to use the VirtIO PCI backend it to help with
>>>>> automatic probing and setup. Could we define a new PCI region which on
>>>>> backend just maps to RAM but from the front-ends point of view is a
>>>>> region it can allocate it's virtqueues? Could we go one step further and
>>>>> just let the host to define and allocate the virtqueue in the reserved
>>>>> PCI space and pass the base of it somehow?
>>>>>
>>>>> Options 4 - Extend VirtIO Config
>>>>> ================================
>>>>>
>>>>> Another approach would be to extend the VirtIO configuration and
>>>>> start-up handshake to supply these limitations to the guest. This could
>>>>> be handled by the addition of a feature bit (VIRTIO_F_HOST_QUEUE?) and
>>>>> additional configuration information.
>>>>>
>>>>> One problem I can foresee is device initialisation is usually done
>>>>> fairly late in the start-up of a kernel by which time any memory zoning
>>>>> restrictions will likely need to have informed the kernels low level
>>>>> memory management. Does that mean we would have to combine such a
>>>>> feature behaviour with a another method anyway?
>>>>>
>>>>> Option 5 - Additional Device
>>>>> ============================
>>>>>
>>>>> The final approach would be to tie the allocation of virtqueues to
>>>>> memory regions as defined by additional devices. For example the
>>>>> proposed IVSHMEMv2 spec offers the ability for the hypervisor to present
>>>>> a fixed non-mappable region of the address space. Other proposals like
>>>>> virtio-mem allow for hot plugging of "physical" memory into the guest
>>>>> (conveniently treatable as separate shareable memory objects for QEMU
>>>>> ;-).
>>>>>
>>>>
>>>> I think you forgot one approach: virtual IOMMU. That is the advanced
>>>> form of the grant table approach. The backend still "sees" the full
>>>> address space of the frontend, but it will not be able to access all of
>>>> it and there might even be a translation going on. Well, like IOMMUs work.
>>>>
>>>> However, this implies dynamics that are under guest control, namely of
>>>> the frontend guest. And such dynamics can be counterproductive for
>>>> certain scenarios. That's where this static windows of shared memory
>>>> came up.
>>>
>>> Yes, I think IOMMU interfaces are worth investigating more too. IOMMUs
>>> are now widely implemented in Linux and virtualization software. That
>>> means guest modifications aren't necessary and unmodified guest
>>> applications will run.
>>>
>>> Applications that need the best performance can use a static mapping
>>> while applications that want the strongest isolation can map/unmap DMA
>>> buffers dynamically.
>>
>> I do not see yet that you can model with an IOMMU a static, not guest
>> controlled window.
> 
> Well basically the IOMMU will have as part of the
> topology description and range of addresses devices behind it
> are allowed to access. What's the problem with that?
> 

I didn't look at the detail of the vIOMMU from that perspective, but our
requirement would be that it would just statically communicate to the
guest where DMA windows are, rather than allowing the guest to configure
that (which is the normal usage of an IOMMU).

In addition, it would only address the memory transfer topic. We would
still be left with the current issue of virtio that the hypervisor's
device model needs to understand all supported device types.

Jan

> 
>> And IOMMU implies guest modifications as well (you need its driver). It
>> just happened to be there now in newer guests. A virtio shared memory
>> transport could be introduced similarly.
>>
>> But the biggest challenge would be that a static mode would allow for a
>> trivial hypervisor side model. Otherwise, we would only try to achieve a
>> simpler secure model by adding complexity elsewhere.
>>
>> I'm not arguing against vIOMMU per se. It's there, it is and will be
>> widely used. It's just not solving all issues.
>>
>> Jan
>>
>> -- 
>> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
>> Corporate Competence Center Embedded Linux
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 


-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]