virtio-comment message

Subject: Re: [PATCH v2 5/9] virtio-iommu: Add VT-d IO page table
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
To: Tina Zhang <tina.zhang@intel.com>
Date: Wed, 1 Nov 2023 15:47:40 +0000
On Fri, Oct 27, 2023 at 11:14:24AM +0800, Tina Zhang wrote:
> > > I think two more definitions are needed:
> > > #define VIRTIO_IOMMU_HW_INTEL_VTD_ECAP_NEST    (1 << 26)
> > > #define VIRTIO_IOMMU_HW_INTEL_VTD_ECAP_FSTS    (1 << 47)
> > > 
> > > The ECAP_NEST tells if VT-d hardware supports nested translations.
> > > The ECAP_FSTS tells if VT-d hardware supports PASID granular translation
> > > type of first-stage.
> > 
> > It doesn't hurt to add them. Would the guest behave differently if NEST or
> > FSTS is set or clear?  I didn't add them because I assumed NEST is a
> > prerequisite for supporting guest page tables, and it requires FSTS.
> Yes, they have some effects. According to VT-d driver and Spec, only NEST=1
> means nested translation is supported by the hardware, otherwise nested
> translation is not supported. But the NEST=1 doesn't automatically mean
> FSTS=1 which reports if the hardware supports PASID granular translation
> type of first-stage table.

I though NEST=1 implies FSTS=1 because of this in VT-d 4.1:

NEST	Hardware implementations reporting Scalable Mode Translation
	Support (SMTS) field as Clear or First-stage Translation Support
	(FSTS) field as Clear or Second-stage Translation Support (SSTS)
	field as Clear also report this field as Clear.

But it doesn't matter for this discussion, I'm just curious.

>
> So if FSTS=0 or NEST=0, guest VT-d driver/lib
> will refuse to allocate a VT-d IO page table for virtio-iommu.
> 
> If we want the virtio-iommu backend to check these bits before reporting the
> supported page-table fmt, then we don't need the virtio-iommu frontend to
> check them again. So, here are two questions:
> 
> 1) If we want the virtio-iommu backend to check vendor IOMMU specific
> capabilities, does it mean we need to add all those details defined by
> vendor Spec into virtio-iommu Spec (like what we did in this version)? Does
> it make sense to virtio-iommu. My understanding is virtio Spec may try to
> avoid vendor specific things and leave them to venders drivers/libs. (I saw
> your comments below. I just want to post my concern here).
> 
> 2) If the virtio-iommu backend just delivers vendor specific cap/ecap to
> virtio-iommu frontend and doesn't do any checking before exposing cap/ecap
> to guest, then virtio-iommu frontend can invoke some generic io-pgtable APIs
> (e.g., in Linux it's io_pgtable_init_fns.alloc()) to allocate a io_pgtable ,
> which operation invokes vendor's alloc() callback and vendor driver/lib can
> do the checking there. Return success if the cap/ecap makes sense to vendor
> IO page-table support and allocate a set of page tables, otherwise return
> unsuccess. If this is the direction, then we probably don't need to add
> IOMMU vendor's specific definitions into virtio-iommu spec. Right?

Yes, that makes sense, I'd like to avoid vendor details in the spec where
possible.

I was wondering whether the virtio-iommu device needs to sanitize hardware
registers for security (hide bits from ecap not relevant to the guest).
I don't think so: it should be impossible for the guest to use the
extraneous information in the cap registers for anything nefarious,
because that implies a vulnerability is always present, and masking
feature bits is not the correct protection against that.

It's still not completely obvious to me that presenting raw ecap registers
is safe, because it's made of more than just 1-bit features. For example I
see a 10-bit "IRO: IOTLB Register Offset" field in there, but I'm guessing
that it doesn't contain secrets, and an attacker couldn't do anything with
that information.

> > More generally there are several ways we could describe these registers in
> > the specification:
> > 
> > (1) Don't describe any fields.
> > (2) Describe only fields that directly influence the configuration of
> >      ATTACH_TABLE and INVALIDATE.
> > (3) Describe fields related in any way to page table support.
> > (4) Describe all fields.
> > 
> > (1) is the easiest from a spec point of view. I tried doing (2) to ease
> I like (1), too. But I don't object to other options, as long as it makes
> sense to virtio-iommu.

Ok, I'll probably go with (1) in next version

> > writing driver and device code, but without explaining in detail what the
> > fields mean because I think that's best left to the hardware
> Agree. We should leave details to vendors Spec.
> 
> > specification. (3) would include NEST, FSTS and possibly others. (4)
> > wouldn't really be useful, we may as well do (1).
> > 
> > Maybe (3) is better. This way the device can use this description to
> > sanitize the registers before passing them to the guest, and the values
> > read by the guest still make sense. Because if we only describe (2), and
> > then the device decides to only provide those bits and clear everything
> > else, then it may be confusing for the driver to find NEST and FSTS clear
> > in the cap registers. Should it then even attach tables?  Maybe in the
> > future there will be a use-case for presenting NEST and FSTS clear to the
> > guest.
> I assume that you are talking about virtio-iommu working with VT-d IO page
> table on platform that has no nested translation support or the nested
> translation support is disable. Right?

Yes, I was thinking of a virtio-iommu implementation that doesn't support
page tables but still wants to communicate some hardware IOMMU features to
the driver. I don't have an example for Intel, but for Arm we'll want to
communicate support for multiple page granules (4KB, 16KB, 64KB), even for
the map/unmap interface. At the moment we present a single granule to the
guest, which is fine on x86 where 4KB is the only granule, but it doesn't
work for Arm setups where the host uses a different granule than the
guest. We need a spec update to support this, and I'm still wondering if
we can reuse the probe property that this series introduces for Arm SMMU,
or if we should introduce separate ones.

Anyway, even if Intel never needs to reuse this probe property for
non-page-table support, it's probably best to have cap/ecap be as close to
their hardware values as possible. It keeps the virtio-iommu device
implementation simple and is easier to extend in the future.

> So, there are three usages:
> 1) On the platform having VT-d hardware nested translation support and the
> functionality is being used by a guest, guest VT-d driver/lib allows
> virtio-iommu frontend to allocate VT-d IO page tables.
> 
> 2) On the platform having no VT-d hardware nested translation support or
> having the support but not being used by a guest, guest VT-d driver/lib
> doesn't allow virtio-iommu frontend to allocate VT-d IO page tables. So
> virtio-iommu legacy map/unmap() is being used (no IO page tables in guest).
> 
> 3) On the platform having no VT-d hardware nested translation support or
> having the support but not being used by a guest, virtio-iommu front-end may
> want to use some virtual IO page table. My understanding is this virtual IO
> page table is generic to vendor IOMMUs. So it should work w/o any help of
> vendor IOMMU driver/lib. Is my understanding right?

That's my thinking as well, although virt page tables (3) isn't useful for
assigned HW devices. For those, either VT-d page tables when supported (1)
or map/unmap (2) can be used. They could also use special tables like
coIOMMU for pinning rather than mapping, which resembles (3). For virtual
devices (eg. virtio-net), page tables would provide better performance
than map/unmap (2), and in this case I think generic virt page tables (3)
is better suited than vendor tables.

Thanks,
Jean
Follow-Ups:
- RE: [PATCH v2 5/9] virtio-iommu: Add VT-d IO page table
  - From: "Zhang, Tina" <tina.zhang@intel.com>