OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [PATCH v2 5/9] virtio-iommu: Add VT-d IO page table

Hi Jean,

On 10/18/23 22:55, Jean-Philippe Brucker wrote:
Hi Tina,

Thank you for the review

On Wed, Oct 11, 2023 at 07:32:01PM +0800, Tina Zhang wrote:
Resend the reply and CC virtio-comment list.


On 10/7/23 00:09, Jean-Philippe Brucker wrote:
From: Tina Zhang <tina.zhang@intel.com>

Allow a driver of a device behind virtio-iommu to detect support for
VT-d IO page table.

The driver detects support for VT-d IO page table format in a PROBE
request and sends an ATTACH_TABLE request with a pointer to the first
stage page translation table.

The definition of first stage page translation table ATTACH_TABLE
request references the fields defined in VT-d scalable-mode PASID table
* attributes defined in ATTACH_TABLE request:
    FSPTPTR got from pgtbl_addr field.
    SRE/EAFE/WPE defined in pgtbl_flags field.
    FSPM calculated through addr_width field.
    PASID got from pasid field.

* attributes not defined in ATTACH_TABLE request, as their
    configurations need to be decided by host:
    Memory type features, PAT/EMTE/CD/PGSNP/PWSNP.
    Snoop Behavior, PGSNP/PWSNP.

Support for any new attributes could easily be added by adding flags to
the PROBE property and fields in the ATTACH_TABLE config.

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
[JPB: moved to pgtable-intel.tex, rework formatting]
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
   device-types/iommu/description.tex   |  3 +
   device-types/iommu/pgtable-intel.tex | 98 ++++++++++++++++++++++++++++
   introduction.tex                     |  3 +
   3 files changed, 104 insertions(+)
   create mode 100644 device-types/iommu/pgtable-intel.tex

diff --git a/device-types/iommu/description.tex b/device-types/iommu/description.tex
index c057cdf..a4eb492 100644
--- a/device-types/iommu/description.tex
+++ b/device-types/iommu/description.tex
@@ -407,6 +407,7 @@ \subsubsection{ATTACH_TABLE request}\label{ref:Device Types / IOMMU Device / Dev
   Attach an endpoint to a domain, in the same way as an ATTACH
@@ -1009,6 +1010,7 @@ \subsubsection{PROBE properties}\label{sec:Device Types / IOMMU Device / Device
   #define VIRTIO_IOMMU_PROBE_T_RESV_MEM       1
   \paragraph{Property RESV_MEM}\label{sec:Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM}
@@ -1160,3 +1162,4 @@ \subsubsection{Fault reporting}\label{sec:Device Types / IOMMU Device / Device o
   \subsection{Acceleration}\label{sec:Device Types / IOMMU Device / Acceleration}
diff --git a/device-types/iommu/pgtable-intel.tex b/device-types/iommu/pgtable-intel.tex
new file mode 100644
index 0000000..90a3410
--- /dev/null
+++ b/device-types/iommu/pgtable-intel.tex
@@ -0,0 +1,98 @@
+\subsubsection{Intel VT-d tables}\label{sec:Device Types / IOMMU Device / Acceleration / Intel}
+Attach first-level translation tables in the format described by
+the \hyperref[intro:VT-Directed-IO]{Intel Virtualization
+Technology for Directed I/O specification}.
+\paragraph{PROBE property for VT-d page tables}\label{sec:Device Types / IOMMU Device / Acceleration / Intel / PROBE}
+information about an Intel VT-d IOMMU.
+struct virtio_iommu_probe_hw_intel_vtd {
+  struct virtio_iommu_probe_property head;
+  u8   reserved[4];
+  le64 cap_reg;
+  le64 ecap_reg;
+#define VIRTIO_IOMMU_HW_INTEL_VTD_CAP_RWBF        (1 << 4)
The size of MGAW is 6 bits. So, it should be:

Good catch, thanks

+#define VIRTIO_IOMMU_HW_INTEL_VTD_CAP_ZLR         (1 << 22)
+#define VIRTIO_IOMMU_HW_INTEL_VTD_CAP_FL1GP       (1 << 56)
+#define VIRTIO_IOMMU_HW_INTEL_VTD_CAP_5LP         (1 << 60)
+#define VIRTIO_IOMMU_HW_INTEL_VTD_ECAP_SC         (1 << 7)
+#define VIRTIO_IOMMU_HW_INTEL_VTD_ECAP_MTS        (1 << 25)
+#define VIRTIO_IOMMU_HW_INTEL_VTD_ECAP_EAFS       (1 << 34)
+#define VIRTIO_IOMMU_HW_INTEL_VTD_ECAP_SMTS       (1 << 43)
+#define VIRTIO_IOMMU_HW_INTEL_VTD_ECAP_VCS        (1 << 44)
I don't think we need VCS defined here any more. In the old design, VCS was
used as a channel for guest asking host to allocate a PASID for it. In the
current design, we drop this and let host mediate guest PASIDs when
necessary (which is agnostic to guest viommu driver). So basically, there
are two cases:
1) For device assignment of I/O device, the PASIDs of the device are managed
by guest. Host doesn't need to do any mediation.
2) For I/O Device sharing case, one device is used by multiple guests. So we
cannot let one guest directly manage the PASIDs of that device. In this
case, with the help of virtualization, host can trap the guest's first DMA
access requiring a PASID and assign a host-side PASID for it. Guest viommu
driver doesn't need to know the host-side PASID value.

Makes sense, I'll drop VCS

+#define VIRTIO_IOMMU_HW_INTEL_VTD_ECAP_SMPWC      (1 << 48)
I think two more definitions are needed:

The ECAP_NEST tells if VT-d hardware supports nested translations.
The ECAP_FSTS tells if VT-d hardware supports PASID granular translation
type of first-stage.

It doesn't hurt to add them. Would the guest behave differently if NEST or
FSTS is set or clear?  I didn't add them because I assumed NEST is a
prerequisite for supporting guest page tables, and it requires FSTS.
Yes, they have some effects. According to VT-d driver and Spec, only NEST=1 means nested translation is supported by the hardware, otherwise nested translation is not supported. But the NEST=1 doesn't automatically mean FSTS=1 which reports if the hardware supports PASID granular translation type of first-stage table. So if FSTS=0 or NEST=0, guest VT-d driver/lib will refuse to allocate a VT-d IO page table for virtio-iommu.

If we want the virtio-iommu backend to check these bits before reporting the supported page-table fmt, then we don't need the virtio-iommu frontend to check them again. So, here are two questions:

1) If we want the virtio-iommu backend to check vendor IOMMU specific capabilities, does it mean we need to add all those details defined by vendor Spec into virtio-iommu Spec (like what we did in this version)? Does it make sense to virtio-iommu. My understanding is virtio Spec may try to avoid vendor specific things and leave them to venders drivers/libs. (I saw your comments below. I just want to post my concern here).

2) If the virtio-iommu backend just delivers vendor specific cap/ecap to virtio-iommu frontend and doesn't do any checking before exposing cap/ecap to guest, then virtio-iommu frontend can invoke some generic io-pgtable APIs (e.g., in Linux it's io_pgtable_init_fns.alloc()) to allocate a io_pgtable , which operation invokes vendor's alloc() callback and vendor driver/lib can do the checking there. Return success if the cap/ecap makes sense to vendor IO page-table support and allocate a set of page tables, otherwise return unsuccess. If this is the direction, then we probably don't need to add IOMMU vendor's specific definitions into virtio-iommu spec. Right?

More generally there are several ways we could describe these registers in
the specification:

(1) Don't describe any fields.
(2) Describe only fields that directly influence the configuration of
(3) Describe fields related in any way to page table support.
(4) Describe all fields.

(1) is the easiest from a spec point of view. I tried doing (2) to ease
I like (1), too. But I don't object to other options, as long as it makes sense to virtio-iommu.

writing driver and device code, but without explaining in detail what the
fields mean because I think that's best left to the hardware
Agree. We should leave details to vendors Spec.

specification. (3) would include NEST, FSTS and possibly others. (4)
wouldn't really be useful, we may as well do (1).

Maybe (3) is better. This way the device can use this description to
sanitize the registers before passing them to the guest, and the values
read by the guest still make sense. Because if we only describe (2), and
then the device decides to only provide those bits and clear everything
else, then it may be confusing for the driver to find NEST and FSTS clear
in the cap registers. Should it then even attach tables?  Maybe in the
future there will be a use-case for presenting NEST and FSTS clear to the
I assume that you are talking about virtio-iommu working with VT-d IO page table on platform that has no nested translation support or the nested translation support is disable. Right?

So, there are three usages:
1) On the platform having VT-d hardware nested translation support and the functionality is being used by a guest, guest VT-d driver/lib allows virtio-iommu frontend to allocate VT-d IO page tables.

2) On the platform having no VT-d hardware nested translation support or having the support but not being used by a guest, guest VT-d driver/lib doesn't allow virtio-iommu frontend to allocate VT-d IO page tables. So virtio-iommu legacy map/unmap() is being used (no IO page tables in guest).

3) On the platform having no VT-d hardware nested translation support or having the support but not being used by a guest, virtio-iommu front-end may want to use some virtual IO page table. My understanding is this virtual IO page table is generic to vendor IOMMUs. So it should work w/o any help of vendor IOMMU driver/lib. Is my understanding right?


+\todo{check that these are necessary and sufficient to support
+page table assignement in scalable mode.}
The cap_reg & ecap_reg expose VT-d hardware capability to guest. And it's
expected to have virtio-iommu working with VT-d IO page table when the
platform hardware at least has the capability of nested-translation (i.e.,
hardware must have scalable mode support and first-stage translation support
and second-stage translation support.). So, we only pick definitions of the
bits from VT-d spec which are related to the first-stage page table and the
basic capability of nested-translation.

+\todo{Need to support non-PASID assignment? non-scalable?}
I didn't see any difference between the non-PASID assignment case and the
PASID assignment case, in terms of virtio-iommu support VT-d. Any ideas?

No, I don't remember what I meant by this... I think non-PASID (for
requests-without-PASID) is already supported in this draft.

The non-scalable doesn't seem so interesting right now. Any thoughts about

Agree, if needed it could be added later.

+  \item[\field{cap_reg}] Capability Register.
+  \item[\field{ecap_reg}] Extended Capability Register.
+\paragraph{ATTACH_TABLE request for VT-d page table}\label{sec:Device Types / IOMMU Device / Acceleration / Intel / ATTACH_TABLE}
+Attach a single set of page tables to an endpoint.
+/* Vt-d I/O Page Table Descriptor */
+struct virtio_iommu_req_attach_table_intel {
+  struct virtio_iommu_req_head          head;
+  le32                                  domain;
+  le32                                  endpoint;
+  u8                                    format;
+  u8                                    reserved[7];
+  le32                                  pasid;
+  le64                                  pgtbl_addr;
+  le64                                  pgtbl_flags;
+  le32                                  addr_width;
+  u8                                    reserved[80];
+  struct virtio_iommu_req_tail          tail;
+/* Intel VT-d stage-1 page table entry attributes */
+#define	VIRTIO_IOMMU_HW_INTEL_VTD_SRE             0x01
+#define	VIRTIO_IOMMU_HW_INTEL_VTD_EAFE            0x02
+#define	VIRTIO_IOMMU_HW_INTEL_VTD_WPE             0x40
+#define	VIRTIO_IOMMU_HW_INTEL_VTD_LAST            0x80
+  \item[\field{format}] VIRTIO_IOMMU_ATTACH_TABLE_INTEL_PT.
+  \item[\field{pasid}] Process Address Space Identifier.
+    \todo{don't we need to allocate these with alloc/free
+    commands to the host?}
No, we don't need guest viommu driver to explicitly ask host's help for
guest PASID alloc/free now. In the current design, guest viommu driver
doesn't need to know the host-side PASID.

Ok, that makes the interface simpler



+  \item[\field{pgtbl_addr}] Stage-1 page table base address.
+  \item[\field{pgtbl_flags}] Stage-1 page table entry attributes:
+    \begin{description}
+      \item{SRE} Supervisor request,
+      \item{EAFE} Extended access enable,
+      \item{WPE} Write protect enable.
+    \end{description}
+  \item[\field{pat}] Page attribute table data to compute
+    effective memory type.
+  \item[\field{addr_width}] The address width of the untranslated
+    addresses that are subjected to the stage-1 page table.
+\paragraph{INVALIDATE request for VT-d page table}\label{sec:Device Types / IOMMU Device / Acceleration / Intel / INVALIDATE}
+Supported values for field \field{scope} are
+The only supported value for field \field{caches} is
+VIRTIO_IOMMU_INVAL_C_TLB. Field \field{id} is not used.
diff --git a/introduction.tex b/introduction.tex
index 636fb1b..d0d5086 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -104,6 +104,9 @@ \section{Normative References}\label{sec:Normative References}
   	\phantomsection\label{intro:SMMUv3}\textbf{[SMMUv3]} &
   	Arm System Memory Management Unit version 3
   	\newline\url{https://developer.arm.com/documentation/ihi0070/latest} \\
+	\phantomsection\label{intro:VT-Directed-IO}\textbf{[VT-Directed-IO]} &
+	Intel Virtualization Technology for Directed I/O
+	\newline\url{https://cdrdv2.intel.com/v1/dl/getContent/671081} \\
   	\phantomsection\label{intro:rfc2784}\textbf{[RFC2784]} &
       Generic Routing Encapsulation. This protocol is only specified for IPv4 and used as either the payload or delivery protocol.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]