[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] [PATCH v3] Add virtio-iommu device specification
On 30/04/2019 14:56, Jean-Philippe Brucker wrote: > The IOMMU device allows a guest to manage DMA mappings for physical, > emulated and paravirtualized endpoints. Add device description for the > virtio-iommu device and driver. Introduce PROBE, ATTACH, DETACH, MAP and > UNMAP requests, as well as translation error reporting. > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/37 > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> I'd like to request a vote for this version (the github issue is now up to date and the patch applies cleanly) Thanks, Jean > --- > Since v2 I rebased onto virtio v1.1-wd02, fixing a conflict in > conformance.tex and using the new \conformance command. > > A PDF version is available at > https://jpbrucker.net/virtio-iommu/spec/virtio-v1.1-wd02-iommu-0.11-draft.pdf > --- > conformance.tex | 40 ++- > content.tex | 1 + > virtio-iommu.tex | 850 +++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 889 insertions(+), 2 deletions(-) > create mode 100644 virtio-iommu.tex > > diff --git a/conformance.tex b/conformance.tex > index 42f702a..79a3e7d 100644 > --- a/conformance.tex > +++ b/conformance.tex > @@ -15,14 +15,14 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} > \begin{itemize} > \item Clause \ref{sec:Conformance / Driver Conformance}. > \item One of clauses \ref{sec:Conformance / Driver Conformance / PCI Driver Conformance}, \ref{sec:Conformance / Driver Conformance / MMIO Driver Conformance} or \ref{sec:Conformance / Driver Conformance / Channel I/O Driver Conformance}. > - \item One of clauses \ref{sec:Conformance / Driver Conformance / Network Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Block Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Console Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Entropy Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Traditional Memory Balloon Driver Conformance}, \ref{sec:Conformance / Driver Conformance / SCSI Host Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Input Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Crypto Driver Conformance} or \ref{sec:Conformance / Driver Conformance / Socket Driver Conformance}. > + \item One of clauses \ref{sec:Conformance / Driver Conformance / Network Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Block Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Console Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Entropy Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Traditional Memory Balloon Driver Conformance}, \ref{sec:Conformance / Driver Conformance / SCSI Host Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Input Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Crypto Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Socket Driver Conformance} or \ref{sec:Conformance / Driver Conformance / IOMMU Driver Conformance}. > \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}. > \end{itemize} > \item[Device] A device MUST conform to four conformance clauses: > \begin{itemize} > \item Clause \ref{sec:Conformance / Device Conformance}. > \item One of clauses \ref{sec:Conformance / Device Conformance / PCI Device Conformance}, \ref{sec:Conformance / Device Conformance / MMIO Device Conformance} or \ref{sec:Conformance / Device Conformance / Channel I/O Device Conformance}. > - \item One of clauses \ref{sec:Conformance / Device Conformance / Network Device Conformance}, \ref{sec:Conformance / Device Conformance / Block Device Conformance}, \ref{sec:Conformance / Device Conformance / Console Device Conformance}, \ref{sec:Conformance / Device Conformance / Entropy Device Conformance}, \ref{sec:Conformance / Device Conformance / Traditional Memory Balloon Device Conformance}, \ref{sec:Conformance / Device Conformance / SCSI Host Device Conformance}, \ref{sec:Conformance / Device Conformance / Input Device Conformance}, \ref{sec:Conformance / Device Conformance / Crypto Device Conformance} or \ref{sec:Conformance / Device Conformance / Socket Device Conformance}. > + \item One of clauses \ref{sec:Conformance / Device Conformance / Network Device Conformance}, \ref{sec:Conformance / Device Conformance / Block Device Conformance}, \ref{sec:Conformance / Device Conformance / Console Device Conformance}, \ref{sec:Conformance / Device Conformance / Entropy Device Conformance}, \ref{sec:Conformance / Device Conformance / Traditional Memory Balloon Device Conformance}, \ref{sec:Conformance / Device Conformance / SCSI Host Device Conformance}, \ref{sec:Conformance / Device Conformance / Input Device Conformance}, \ref{sec:Conformance / Device Conformance / Crypto Device Conformance}, \ref{sec:Conformance / Device Conformance / Socket Device Conformance} or \ref{sec:Conformance / Device Conformance / IOMMU Device Conformance}. > \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}. > \end{itemize} > \end{description} > @@ -183,6 +183,24 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} > \item \ref{drivernormative:Device Types / Socket Device / Device Operation / Device Events} > \end{itemize} > > +\conformance{\subsection}{IOMMU Driver Conformance}\label{sec:Conformance / Driver Conformance / IOMMU Driver Conformance} > + > +An IOMMU driver MUST conform to the following normative statements: > + > +\begin{itemize} > +\item \ref{drivernormative:Device Types / IOMMU Device / Feature bits} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device configuration layout} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device Initialization} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device operations} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device operations / ATTACH request} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device operations / DETACH request} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device operations / MAP request} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device operations / UNMAP request} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device operations / PROBE request} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM} > +\item \ref{drivernormative:Device Types / IOMMU Device / Device operations / Fault reporting} > +\end{itemize} > + > \conformance{\section}{Device Conformance}\label{sec:Conformance / Device Conformance} > > A device MUST conform to the following normative statements: > @@ -336,6 +354,24 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} > \item \ref{devicenormative:Device Types / Socket Device / Device Operation / Receive and Transmit} > \end{itemize} > > +\conformance{\subsection}{IOMMU Device Conformance}\label{sec:Conformance / Device Conformance / IOMMU Device Conformance} > + > +An IOMMU device MUST conform to the following normative statements: > + > +\begin{itemize} > +\item \ref{devicenormative:Device Types / IOMMU Device / Feature bits} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device configuration layout} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device Initialization} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device operations} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device operations / ATTACH request} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device operations / DETACH request} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device operations / MAP request} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device operations / UNMAP request} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device operations / PROBE request} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM} > +\item \ref{devicenormative:Device Types / IOMMU Device / Device operations / Fault reporting} > +\end{itemize} > + > \conformance{\section}{Legacy Interface: Transitional Device and Transitional Driver Conformance}\label{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance} > A conformant implementation MUST be either transitional or > non-transitional, see \ref{intro:Legacy > diff --git a/content.tex b/content.tex > index 193b6e1..5449a46 100644 > --- a/content.tex > +++ b/content.tex > @@ -5594,6 +5594,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device > \input{virtio-input.tex} > \input{virtio-crypto.tex} > \input{virtio-vsock.tex} > +\input{virtio-iommu.tex} > > \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > > diff --git a/virtio-iommu.tex b/virtio-iommu.tex > new file mode 100644 > index 0000000..be3dcd3 > --- /dev/null > +++ b/virtio-iommu.tex > @@ -0,0 +1,850 @@ > +\section{IOMMU device}\label{sec:Device Types / IOMMU Device} > + > +The virtio-iommu device manages Direct Memory Access (DMA) from one or > +more endpoints. It may act both as a proxy for physical IOMMUs managing > +devices assigned to the guest, and as virtual IOMMU managing emulated and > +paravirtualized devices. > + > +The driver first discovers endpoints managed by the virtio-iommu device > +using standard firmware mechanisms. It then sends requests to create > +virtual address spaces and virtual-to-physical mappings for these > +endpoints. In its simplest form, the virtio-iommu supports four request > +types: > + > +\begin{enumerate} > +\item Create a domain and attach an endpoint to it. \\ > + \texttt{attach(endpoint = 0x8, domain = 1)} > +\item Create a mapping between a range of guest-virtual and guest-physical > + address. \\ > + \texttt{map(domain = 1, virt_start = 0x1000, virt_end = 0x1fff, > + phys = 0xa000, flags = READ)} > + > + Endpoint 0x8, for example a hardware PCI endpoint with BDF 00:01.0, can > + now read at addresses 0x1000-0x1fff. These accesses are translated > + into system-physical addresses by the IOMMU. > + > +\item Remove the mapping.\\ > + \texttt{unmap(domain = 1, virt_start = 0x1000, virt_end = 0x1fff)} > + > + Any access to addresses 0x1000-0x1fff by endpoint 0x8 would now be > + rejected. > +\item Detach the device and remove the domain.\\ > + \texttt{detach(endpoint = 0x8, domain = 1)} > +\end{enumerate} > + > +\subsection{Device ID}\label{sec:Device Types / IOMMU Device / Device ID} > + > +23 > + > +\subsection{Virtqueues}\label{sec:Device Types / IOMMU Device / Virtqueues} > + > +\begin{description} > +\item[0] requestq > +\item[1] eventq > +\end{description} > + > +\subsection{Feature bits}\label{sec:Device Types / IOMMU Device / Feature bits} > + > +\begin{description} > +\item[VIRTIO_IOMMU_F_INPUT_RANGE (0)] > + Available range of virtual addresses is described in \field{input_range} > + > +\item[VIRTIO_IOMMU_F_DOMAIN_BITS (1)] > + The number of domains supported is described in \field{domain_bits} > + > +\item[VIRTIO_IOMMU_F_MAP_UNMAP (2)] > + Map and unmap requests are available.\footnote{Future extensions may add > + different modes of operations. At the moment, only > + VIRTIO_IOMMU_F_MAP_UNMAP is supported.} > + > +\item[VIRTIO_IOMMU_F_BYPASS (3)] > + When not attached to a domain, endpoints downstream of the IOMMU > + can access the guest-physical address space. > + > +\item[VIRTIO_IOMMU_F_PROBE (4)] > + The PROBE request is available. > +\end{description} > + > +\drivernormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / Feature bits} > + > +The driver SHOULD accept any of the VIRTIO_IOMMU_F_INPUT_RANGE, > +VIRTIO_IOMMU_F_DOMAIN_BITS, VIRTIO_IOMMU_F_MAP_UNMAP and > +VIRTIO_IOMMU_F_PROBE feature bits if offered by the device. > + > +\devicenormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / Feature bits} > + > +If the device offers any of VIRTIO_IOMMU_F_INPUT_RANGE, > +VIRTIO_IOMMU_F_DOMAIN_BITS, VIRTIO_IOMMU_F_PROBE or > +VIRTIO_IOMMU_F_MAP_UNMAP feature bits, and if the driver did not accept > +this feature bit, then the device MAY signal failure by failing to set > +FEATURES_OK \field{device status} bit when the driver writes it. > + > +\subsection{Device configuration layout}\label{sec:Device Types / IOMMU Device / Device configuration layout} > + > +The \field{page_size_mask} field is always present. Availability of the > +others depend on various feature bits as indicated above. > + > +\begin{lstlisting} > +struct virtio_iommu_config { > + u64 page_size_mask; > + struct virtio_iommu_range { > + u64 start; > + u64 end; > + } input_range; > + u8 domain_bits; > + u8 padding[3]; > + u32 probe_size; > +}; > +\end{lstlisting} > + > +\drivernormative{\subsubsection}{Device configuration layout}{Device Types / IOMMU Device / Device configuration layout} > + > +The driver MUST NOT write to device configuration fields. > + > +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / IOMMU Device / Device configuration layout} > + > +The device SHOULD set \field{padding} to zero. > + > +The device MUST set at least one bit in \field{page_size_mask}, describing > +the page granularity. The device MAY set more than one bit in > +\field{page_size_mask}. > + > +\subsection{Device initialization}\label{sec:Device Types / IOMMU Device / Device initialization} > + > +When the device is reset, endpoints are not attached to any domain. > +If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, all endpoints can > +access guest-physical addresses ("bypass mode"). If the feature is not > +negotiated, then any memory access from endpoints will fault. Upon > +attaching an endpoint in bypass mode to a new domain, any memory access > +from the endpoint will fault, since the domain does not contain any > +mapping. > + > +The driver chooses operating mode depending on its capabilities. In this > +version of the virtio-iommu device, the only supported mode is > +VIRTIO_IOMMU_F_MAP_UNMAP. > + > +\drivernormative{\subsubsection}{Device Initialization}{Device Types / IOMMU Device / Device Initialization} > + > +The driver MUST NOT negotiate VIRTIO_IOMMU_F_MAP_UNMAP if it is incapable > +of sending VIRTIO_IOMMU_T_MAP and VIRTIO_IOMMU_T_UNMAP requests. > + > +If the VIRTIO_IOMMU_F_PROBE feature is negotiated, the driver SHOULD send a > +VIRTIO_IOMMU_T_PROBE request for each endpoint before attaching the > +endpoint to a domain. > + > +\devicenormative{\subsubsection}{Device Initialization}{Device Types / IOMMU Device / Device Initialization} > + > +If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the > +device SHOULD NOT let endpoints access the guest-physical address space. > + > +\subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device operations} > + > +Driver send requests on the request virtqueue, notifies the device and > +waits for the device to return the request with a status in the used ring. > +All requests are split in two parts: one device-readable, one device- > +writable. > + > +\begin{lstlisting} > +struct virtio_iommu_req_head { > + u8 type; > + u8 reserved[3]; > +}; > + > +struct virtio_iommu_req_tail { > + u8 status; > + u8 reserved[3]; > +}; > +\end{lstlisting} > + > +Type may be one of: > + > +\begin{lstlisting} > +#define VIRTIO_IOMMU_T_ATTACH 1 > +#define VIRTIO_IOMMU_T_DETACH 2 > +#define VIRTIO_IOMMU_T_MAP 3 > +#define VIRTIO_IOMMU_T_UNMAP 4 > +#define VIRTIO_IOMMU_T_PROBE 5 > +\end{lstlisting} > + > +A few general-purpose status codes are defined here. Unless explicitly > +described in a \textbf{Requirements} section, these values are hints to > +make troubleshooting easier. > + > +When the device fails to parse a request, for instance if a request seems > +too small for its type and the device cannot find the tail, then it will > +be unable to set \field{status}. In that case, it should return the > +buffers without writing in them. > + > +\begin{lstlisting} > +/* All good! Carry on. */ > +#define VIRTIO_IOMMU_S_OK 0 > +/* Virtio communication error */ > +#define VIRTIO_IOMMU_S_IOERR 1 > +/* Unsupported request */ > +#define VIRTIO_IOMMU_S_UNSUPP 2 > +/* Internal device error */ > +#define VIRTIO_IOMMU_S_DEVERR 3 > +/* Invalid parameters */ > +#define VIRTIO_IOMMU_S_INVAL 4 > +/* Out-of-range parameters */ > +#define VIRTIO_IOMMU_S_RANGE 5 > +/* Entry not found */ > +#define VIRTIO_IOMMU_S_NOENT 6 > +/* Bad address */ > +#define VIRTIO_IOMMU_S_FAULT 7 > +\end{lstlisting} > + > +Range limits of some request fields are described in the device > +configuration: > + > +\begin{itemize} > +\item \field{page_size_mask} contains the bitmask of all page sizes that > + can be mapped. The least significant bit set defines the page > + granularity of IOMMU mappings. Other bits in the mask are hints > + describing page sizes that the IOMMU can merge into a single mapping > + (page blocks). > + > + The smallest page granularity supported by the IOMMU is one byte. It is > + legal for the driver to map one byte at a time if bit 0 of > + \field{page_size_mask} is set. > + > +\item If the VIRTIO_IOMMU_F_DOMAIN_BITS feature is offered, > + \field{domain_bits} contains the number of bits supported in a domain > + ID, the identifier used in most requests. A value of 0 is valid, it > + means that a single domain is supported and endpoints can only be > + attached to domain 0. > + > + If the feature is not offered, domain identifiers can use up to 32 bits. > + > +\item If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered, > + \field{input_range} contains the virtual address range that the IOMMU is > + able to translate. Any mapping request to virtual addresses outside of > + this range will fail. > + > + If the feature is not offered, virtual mappings span over the whole > + 64-bit address space (\texttt{start = 0, end = 0xffffffff ffffffff}) > +\end{itemize} > + > +\drivernormative{\subsubsection}{Device operations}{Device Types / IOMMU Device / Device operations} > + > +The driver SHOULD set field \field{reserved} of > +\verb+struct virtio_iommu_req_head+ to zero. > + > +When a device returns a complete request in the used queue without having > +written to it, the driver SHOULD interpret it as a failure from the device > +to parse the request. > + > +If the VIRTIO_IOMMU_F_INPUT_RANGE feature is negotiated, the driver SHOULD > +NOT send requests with \field{virt_start} less than > +\field{input_range.start} or \field{virt_end} greater than > +\field{input_range.end}. > + > +If the VIRTIO_IOMMU_F_DOMAIN_BITS feature is negotiated, the driver SHOULD > +NOT send requests with \field{domain} greater than the size described by > +\field{domain_bits}. > + > +The driver SHOULD NOT use multiple descriptor chains for a single request. > + > +\devicenormative{\subsubsection}{Device operations}{Device Types / IOMMU Device / Device operations} > + > +The device SHOULD NOT set \field{status} to VIRTIO_IOMMU_S_OK if a request > +didn't succeed. > + > +If a request \field{type} is not recognized, the device SHOULD return the > +buffers on the used ring and set the \field{len} field of the used element > +to zero. > + > +The device SHOULD ignore field \field{reserved} of > +\verb+struct virtio_iommu_req_head+ and SHOULD set field \field{reserved} > +of \verb+struct virtio_iommu_req_tail+ to zero. > + > +If the VIRTIO_IOMMU_F_INPUT_RANGE feature is negotiated and the range > +described by fields \field{virt_start} and \field{virt_end} doesn't fit in > +the range described by \field{input_range}, the device MAY set > +\field{status} to VIRTIO_IOMMU_S_RANGE and ignore the request. > + > +If the VIRTIO_IOMMU_F_DOMAIN_BITS is negotiated and bits above > +\field{domain_bits} are set in field \field{domain}, the device MAY set > +\field{status} to VIRTIO_IOMMU_S_RANGE and ignore the request. > + > +\subsubsection{ATTACH request}\label{sec:Device Types / IOMMU Device / Device operations / ATTACH request} > + > +\begin{lstlisting} > +struct virtio_iommu_req_attach { > + struct virtio_iommu_req_head head; > + le32 domain; > + le32 endpoint; > + u8 reserved[8]; > + struct virtio_iommu_req_tail tail; > +}; > +\end{lstlisting} > + > +Attach an endpoint to a domain. \field{domain} is an identifier unique to > +the virtio-iommu device. The \field{domain} number doesn't have a meaning > +outside of virtio-iommu. If the domain doesn't exist in the device, it is > +created. \field{endpoint} is an identifier unique to the virtio-iommu > +device. The host communicates these unique endpoint IDs to the guest using > +methods outside the scope of this specification, but the following rules > +apply: > + > +\begin{itemize} > +\item The endpoint ID is unique from the virtio-iommu point of view. > + Multiple endpoints whose DMA transactions are not translated by the same > + virtio-iommu may have the same endpoint ID. Endpoints whose DMA > + transactions may be translated by the same virtio-iommu must have > + different endpoint IDs. > + > +\item Sometimes the host cannot completely isolate two endpoints from each > + others. For example on a legacy PCI bus, endpoints can snoop DMA > + transactions from their neighbours. In this case, the host must > + communicate to the guest that it cannot isolate these endpoints from > + each others, or that the physical IOMMU cannot distinguish transactions > + coming from these endpoints. The method used to communicate this is > + outside the scope of this specification. > +\end{itemize} > + > +Multiple endpoints may be attached to the same domain. An endpoint cannot > +be attached to multiple domains at the same time. > + > +\drivernormative{\paragraph}{ATTACH request}{Device Types / IOMMU Device / Device operations / ATTACH request} > + > +The driver SHOULD set \field{reserved} to zero. > + > +The driver SHOULD ensure that endpoints that cannot be isolated by the > +host are attached to the same domain. > + > +\devicenormative{\paragraph}{ATTACH request}{Device Types / IOMMU Device / Device operations / ATTACH request} > + > +If the \field{reserved} field of an ATTACH request is not zero, the device > +SHOULD set the request \field{status} to VIRTIO_IOMMU_S_INVAL and SHOULD > +NOT attach the endpoint to the domain. \footnote{The device should > +validate input of ATTACH requests in case the driver attempts to attach in > +a mode that is unimplemented by the device, and would be incompatible with > +the modes implemented by the device.} > + > +If the endpoint identified by \field{endpoint} doesn't exist, then the > +device SHOULD set the request \field{status} to VIRTIO_IOMMU_S_NOENT. > + > +If another endpoint is already attached to the domain identified by > +\field{domain}, then the device MAY attach the endpoint identified by > +\field{endpoint} to the domain. If it cannot do so, the device > +MUST set the request \field{status} to VIRTIO_IOMMU_S_UNSUPP. > + > +If the endpoint identified by \field{endpoint} is already attached to > +another domain, then the device SHOULD first detach it from that domain > +and attach it to the one identified by \field{domain}. In that case the > +device behaves as if the driver issued a DETACH request with this > +\field{endpoint}, followed by the ATTACH request. If the device cannot do > +so, it MUST set the request \field{status} to VIRTIO_IOMMU_S_UNSUPP. > + > +If properties of the endpoint (obtained with a PROBE request) are > +incompatible with properties of other endpoints already attached to the > +requested domain, the device MAY attach the endpoint. If it cannot do so, the > +device SHOULD set the request \field{status} to VIRTIO_IOMMU_S_UNSUPP. > +\footnote{In general it is simpler and safer to reject attach when two devices > +have differing values in a property, for example two reserved regions of > +different types that would overlap. Depending on the property, device > +implementation can try to merge them and accept the attach.} > + > +\subsubsection{DETACH request} > + > +\begin{lstlisting} > +struct virtio_iommu_req_detach { > + struct virtio_iommu_req_head head; > + le32 domain; > + le32 endpoint; > + u8 reserved[8]; > + struct virtio_iommu_req_tail tail; > +}; > +\end{lstlisting} > + > +Detach an endpoint from a domain. When this request completes, the > +endpoint cannot access any mapping from that domain anymore. If feature > +VIRTIO_IOMMU_F_BYPASS has been negotiated, then the endpoint accesses the > +guest-physical address space once this request completes. > + > +After all endpoints have been successfully detached from a domain, it > +ceases to exist and its ID can be reused by the driver for another domain. > + > +\drivernormative{\paragraph}{DETACH request}{Device Types / IOMMU Device / Device operations / DETACH request} > + > +The driver SHOULD set \field{reserved} to zero. > + > +\devicenormative{\paragraph}{DETACH request}{Device Types / IOMMU Device / Device operations / DETACH request} > + > +If the \field{reserved} field of a DETACH request is not zero, the device > +MAY set the request \field{status} to VIRTIO_IOMMU_S_INVAL, in which case > +the device MAY still perform the DETACH operation. > + > +If the endpoint identified by \field{endpoint} doesn't exist, then the > +device SHOULD set the request \field{status} to VIRTIO_IOMMU_S_NOENT. > + > +If the domain identified by \field{domain} doesn't exist, or if the > +endpoint identified by \field{endpoint} isn't attached to this domain, > +then the device MAY set the request \field{status} to > +VIRTIO_IOMMU_S_INVAL. > + > +The device MUST ensure that after being detached from a domain, the > +endpoint cannot access any mapping from that domain. > + > +\subsubsection{MAP request}\label{sec:Device Types / IOMMU Device / Device operations / MAP request} > + > +\begin{lstlisting} > +struct virtio_iommu_req_map { > + struct virtio_iommu_req_head head; > + le32 domain; > + le64 virt_start; > + le64 virt_end; > + le64 phys_start; > + le32 flags; > + struct virtio_iommu_req_tail tail; > +}; > + > +/* Flags are: */ > +#define VIRTIO_IOMMU_MAP_F_READ (1 << 0) > +#define VIRTIO_IOMMU_MAP_F_WRITE (1 << 1) > +#define VIRTIO_IOMMU_MAP_F_EXEC (1 << 2) > +#define VIRTIO_IOMMU_MAP_F_MMIO (1 << 3) > +\end{lstlisting} > + > +Map a range of virtually-contiguous addresses to a range of > +physically-contiguous addresses of the same size. After the request > +succeeds, all endpoints attached to this domain can access memory in the > +range $[virt\_start; virt\_end]$ (inclusive). For example, if an endpoint > +accesses address $VA \in [virt\_start; virt\_end]$, the device (or the > +physical IOMMU) translates the address: $PA = VA - virt\_start + > +phys\_start$. If the access parameters are compatible with \field{flags} > +(for instance, the access is write and \field{flags} are > +VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE) then the IOMMU allows > +the access to reach $PA$. > + > +The range defined by \field{virt_start} and \field{virt_end} should be > +within the limits specified by \field{input_range}. Given $phys\_end = > +phys\_start + virt\_end - virt\_start$, the range defined by > +\field{phys_start} and phys_end should be within the guest-physical > +address space. This includes upper and lower limits, as well as any > +carving of guest-physical addresses for use by the host. Guest physical > +boundaries are set by the host using a firmware mechanism outside the > +scope of this specification. > + > +Availability and allowed combinations of \field{flags} depend on the > +underlying IOMMU architectures. VIRTIO_IOMMU_MAP_F_READ and > +VIRTIO_IOMMU_MAP_F_WRITE are usually implemented, although READ is > +sometimes implied by WRITE. VIRTIO_IOMMU_MAP_F_EXEC might not be > +available. In addition combinations such as "WRITE and not READ" or "WRITE > +and EXEC" might not be supported. > + > +The VIRTIO_IOMMU_MAP_F_MMIO flag is a memory type rather than a protection > +flag. It may be used, for example, to map Message Signaled Interrupt > +doorbells when a VIRTIO_IOMMU_RESV_MEM_T_MSI region isn't available. To > +trigger interrupts the endpoint performs a direct memory write to another > +peripheral, the IRQ chip. Since it is a signal, the write must not be > +buffered, elided, or combined with other writes by the memory > +interconnect. The precise meaning of the MMIO flag depends on the > +underlying memory architecture (for example on Armv8-A it corresponds to > +the "Device-nGnRE" memory type). Unless needed by mapped MSIs, the device > +isn't required to support the MMIO flag. > + > +This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been > +negotiated. > + > +\drivernormative{\paragraph}{MAP request}{Device Types / IOMMU Device / Device operations / MAP request} > + > +The driver SHOULD set undefined \field{flags} bits to zero. > + > +\field{virt_end} MUST be strictly greater than \field{virt_start}. > + > +The driver SHOULD set the VIRTIO_IOMMU_MAP_F_MMIO flag when the physical > +range corresponds to memory-mapped device registers. The physical range > +SHOULD have a single memory type: either normal memory or memory-mapped > +I/O. > + > +\devicenormative{\paragraph}{MAP request}{Device Types / IOMMU Device / Device operations / MAP request} > + > +If \field{virt_start}, \field{phys_start} or (\field{virt_end} + 1) is > +not aligned on the page granularity, the device SHOULD set the request > +\field{status} to VIRTIO_IOMMU_S_RANGE and SHOULD NOT create the mapping. > + > +If a mapping already exists in the requested range, the device SHOULD set > +the request \field{status} to VIRTIO_IOMMU_S_INVAL and SHOULD NOT change > +any mapping. > + > +If the device doesn't recognize a \field{flags} bit, it SHOULD set the > +request \field{status} to VIRTIO_IOMMU_S_INVAL. In this case the device > +SHOULD NOT create the mapping. \footnote{Validating the input is important > +here, because the driver might be attempting to map with special flags > +that the device doesn't recognize. Creating the mapping with incompatible > +flags may result in loss of coherency and security hazards.} > + > +If a flag or combination of flag isn't supported, the device MAY set the > +request \field{status} to VIRTIO_IOMMU_S_UNSUPP. > + > +The device MUST NOT allow writes to a range mapped without the > +VIRTIO_IOMMU_MAP_F_WRITE flag. However, if the underlying architecture > +does not support write-only mappings, the device MAY allow reads to a > +range mapped with VIRTIO_IOMMU_MAP_F_WRITE but not > +VIRTIO_IOMMU_MAP_F_READ. > + > +If \field{domain} does not exist, the device SHOULD set the request > +\field{status} to VIRTIO_IOMMU_S_NOENT. > + > +\subsubsection{UNMAP request}\label{sec:Device Types / IOMMU Device / Device operations / UNMAP request} > + > +\begin{lstlisting} > +struct virtio_iommu_req_unmap { > + struct virtio_iommu_req_head head; > + le32 domain; > + le64 virt_start; > + le64 virt_end; > + u8 reserved[4]; > + struct virtio_iommu_req_tail tail; > +}; > +\end{lstlisting} > + > +Unmap a range of addresses mapped with VIRTIO_IOMMU_T_MAP. We define here > +a mapping as a virtual region created with a single MAP request. All > +mappings covered by the range $[virt\_start; virt\_end]$ (inclusive) are > +removed. > + > +The semantics of unmapping are specified in \ref{drivernormative:Device > +Types / IOMMU Device / Device operations / UNMAP request} and > +\ref{devicenormative:Device Types / IOMMU Device / Device operations / > +UNMAP request}, and illustrated with the following requests, assuming each > +example sequence starts with a blank address space. We define two > +pseudocode functions \texttt{map(virt_start, virt_end) -> mapping} and > +\texttt{unmap(virt_start, virt_end)}. > + > +\begin{lstlisting} > +(1) unmap(virt_start=0, > + virt_end=4) -> succeeds, doesn't unmap anything > + > +(2) a = map(virt_start=0, > + virt_end=9); > + unmap(0, 9) -> succeeds, unmaps a > + > +(3) a = map(0, 4); > + b = map(5, 9); > + unmap(0, 9) -> succeeds, unmaps a and b > + > +(4) a = map(0, 9); > + unmap(0, 4) -> faults, doesn't unmap anything > + > +(5) a = map(0, 4); > + b = map(5, 9); > + unmap(0, 4) -> succeeds, unmaps a > + > +(6) a = map(0, 4); > + unmap(0, 9) -> succeeds, unmaps a > + > +(7) a = map(0, 4); > + b = map(10, 14); > + unmap(0, 14) -> succeeds, unmaps a and b > +\end{lstlisting} > + > +This request is only available when VIRTIO_IOMMU_F_MAP_UNMAP has been > +negotiated. > + > +\drivernormative{\paragraph}{UNMAP request}{Device Types / IOMMU Device / Device operations / UNMAP request} > + > +The driver SHOULD set the \field{reserved} field to zero. > + > +The range, defined by \field{virt_start} and \field{virt_end}, SHOULD > +cover one or more contiguous mappings created with MAP requests. The range > +MAY spill over unmapped virtual addresses. > + > +The first address of a range SHOULD either be the first address of a > +mapping or be outside any mapping. The last address of a range SHOULD > +either be the last address of a mapping or be outside any mapping. > + > +\devicenormative{\paragraph}{UNMAP request}{Device Types / IOMMU Device / Device operations / UNMAP request} > + > +If the \field{reserved} field of an UNMAP request is not zero, the device > +MAY set the request \field{status} to VIRTIO_IOMMU_S_INVAL, in which case > +the device MAY perform the UNMAP operation. > + > +If \field{domain} does not exist, the device SHOULD set the request > +\field{status} to VIRTIO_IOMMU_S_NOENT. > + > +If a mapping affected by the range is not covered in its entirety by the > +range (the UNMAP request would split the mapping), then the device SHOULD > +set the request \field{status} to VIRTIO_IOMMU_S_RANGE, and SHOULD NOT > +remove any mapping. > + > +If part of the range or the full range is not covered by an existing > +mapping, then the device SHOULD remove all mappings affected by the range > +and set the request \field{status} to VIRTIO_IOMMU_S_OK. > + > +\subsubsection{PROBE request}\label{sec:Device Types / IOMMU Device / Device operations / PROBE request} > + > +If the VIRTIO_IOMMU_F_PROBE feature bit is present, the driver sends a > +VIRTIO_IOMMU_T_PROBE request for each endpoint that the virtio-iommu > +device manages. This probe is performed before attaching the endpoint to > +a domain. > + > +\begin{lstlisting} > +struct virtio_iommu_req_probe { > + struct virtio_iommu_req_head head; > + /* Device-readable */ > + le32 endpoint; > + u8 reserved[64]; > + > + /* Device-writable */ > + u8 properties[probe_size]; > + struct virtio_iommu_req_tail tail; > +}; > +\end{lstlisting} > + > +\begin{description} > +\item[\field{endpoint}] has the same meaning as in ATTACH and DETACH > + requests. > + > +\item[\field{reserved}] is used as padding, so that future extensions can > + add fields to the device-readable part. > + > +\item[\field{properties}] contains a list of properties of the > + \field{endpoint}, filled by the device. The length of the > + \field{properties} field is \field{probe_size} bytes. Each property is > + described with a \verb+struct virtio_iommu_probe_property+ header, which > + may be followed by a value of size \field{length}. > + > +\begin{lstlisting} > +#define VIRTIO_IOMMU_PROBE_T_MASK 0xfff > + > +struct virtio_iommu_probe_property { > + le16 type; > + le16 length; > +}; > +\end{lstlisting} > + > +\end{description} > + > +The driver allocates a buffer of adequate size for the probe request, > +writes \field{endpoint} and adds the buffer to the request queue. The > +device fills the \field{properties} field with a list of properties for > +this endpoint. > + > +The driver parses the first property by reading \field{type}, then > +\field{length}. If the driver recognizes \field{type}, it reads and > +handles the rest of the property. The driver then reads the next property, > +that is located $(\field{length} + 4)$ bytes after the beginning of the > +first one, and so on. The driver parses all properties until it reaches a > +NONE property or the end of \field{properties}. > + > +The upper nibble of \field{type} is reserved for future extensions. > +Therefore only 4096 types are available. The actual type of a property is > +extracted like this: > + > +\begin{lstlisting} > +u16 type = le16_to_cpu(property.type) & VIRTIO_IOMMU_PROBE_T_MASK; > +\end{lstlisting} > + > +Available property types are described in section > +\ref{sec:Device Types / IOMMU Device / Device operations / PROBE properties}. > + > +\drivernormative{\paragraph}{PROBE request}{Device Types / IOMMU Device / Device operations / PROBE request} > + > +The size of \field{properties} MUST be \field{probe_size} bytes. > + > +The driver SHOULD set \field{reserved} to zero. > + > +If the driver doesn't recognize the \field{type} of a property, it SHOULD > +ignore the property and continue parsing the list. > + > +The driver SHOULD NOT deduce the property length from \field{type}. > + > +The driver SHOULD ignore bits[15:12] of \field{type}. > + > +\devicenormative{\paragraph}{PROBE request}{Device Types / IOMMU Device / Device operations / PROBE request} > + > +If the \field{reserved} field of a PROBE request is not zero, the device > +MAY set the request \field{status} to VIRTIO_IOMMU_S_INVAL. > + > +If the endpoint identified by \field{endpoint} doesn't exist, then the > +device SHOULD set the request \field{status} to VIRTIO_IOMMU_S_NOENT. > + > +If the device does not offer the VIRTIO_IOMMU_F_PROBE feature, and if the > +driver sends a VIRTIO_IOMMU_T_PROBE request, then the device SHOULD return > +the buffers on the used ring and set the \field{len} field of the used > +element to zero. > + > +The device SHOULD set bits [15:12] of property \field{type} to zero. > + > +The device MUST write the size of the property without the > +\verb+struct virtio_iommu_probe_property+ header, in bytes, into > +\field{length}. > + > +When two properties follow each others, the device MUST put the second > +property exactly $(\field{length} + 4)$ bytes after the beginning of the > +first one. > + > +If the \field{properties} list is smaller than \field{probe_size}, then > +the device SHOULD NOT write any property and SHOULD set the request > +\field{status} to VIRTIO_IOMMU_S_INVAL. > + > +If the device doesn't fill all \field{probe_size} bytes with properties, > +it SHOULD fill the remaining bytes of \field{properties} with zeroes. > + > +\subsubsection{PROBE properties}\label{sec:Device Types / IOMMU Device / Device operations / PROBE properties} > + > +\begin{lstlisting} > +#define VIRTIO_IOMMU_PROBE_T_NONE 0 > +#define VIRTIO_IOMMU_PROBE_T_RESV_MEM 1 > +\end{lstlisting} > + > +\paragraph{Property NONE}\label{sec:Device Types / IOMMU Device / Device operations / PROBE properties / NONE} > + > +Marks the end of the property list. This property doesn't have any value, > +and should have \field{length} 0. > + > +\paragraph{Property RESV_MEM}\label{sec:Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM} > + > +The RESV_MEM property describes a chunk of reserved virtual memory. It may > +be used by the device to describe virtual address ranges that shouldn't be > +allocated by the driver, or that are special. > + > +\begin{lstlisting} > +struct virtio_iommu_probe_resv_mem { > + struct virtio_iommu_probe_property head; > + u8 subtype; > + u8 reserved[3]; > + le64 start; > + le64 end; > +}; > +\end{lstlisting} > + > +Fields \field{start} and \field{end} describe the range of reserved virtual > +addresses. \field{subtype} may be one of: > + > +\begin{description} > + \item[VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0)] > + Accesses to virtual addresses in this region have undefined behavior. > + They may be aborted by the device, bypass it, or never even reach it. > + The region may also be used for host mappings, for example Message > + Signaled Interrupts. > + > + The guest should neither use these virtual addresses in a MAP request > + nor instruct endpoints to perform DMA on them. > + > + \item[VIRTIO_IOMMU_RESV_MEM_T_MSI (1)] > + This region is a doorbell for Message Signaled Interrupts (MSIs). It > + is similar to VIRTIO_IOMMU_RESV_MEM_T_RESERVED, in that the driver > + should not map virtual addresses described by the property. > + > + In addition it tells the guest how to handle MSI doorbells. If the > + endpoint doesn't have a VIRTIO_IOMMU_RESV_MEM_T_MSI property > + corresponding to the doorbell of a virtual MSI controller, then the > + guest should create a mapping for it. > +\end{description} > + > +\drivernormative{\subparagraph}{Property RESV_MEM}{Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM} > + > +The driver SHOULD NOT map any virtual address described by a > +VIRTIO_IOMMU_RESV_MEM_T_RESERVED or VIRTIO_IOMMU_RESV_MEM_T_MSI property. > + > +The driver SHOULD ignore \field{reserved}. > + > +The driver SHOULD treat any \field{subtype} it doesn't recognize as if it > +was VIRTIO_IOMMU_RESV_MEM_T_RESERVED. > + > +\devicenormative{\subparagraph}{Property RESV_MEM}{Device Types / IOMMU Device / Device operations / PROBE properties / RESVMEM} > + > +The device SHOULD set \field{reserved} to zero. > + > +The device SHOULD NOT present more than one VIRTIO_IOMMU_RESV_MEM_T_MSI > +property per endpoint. > + > +The device SHOULD NOT present RESV_MEM properties that overlap each others > +for the same endpoint. > + > +\subsubsection{Fault reporting}\label{sev:Device Types / IOMMU Device / Device operations / Fault reporting} > + > +The device can report translation faults and other significant asynchronous > +events on the event virtqueue. The driver initially populates the queue with > +empty report buffers. When the device needs to report an event, it fills a > +buffer and notifies the driver with an interrupt. The driver consumes the > +report and moves the buffer back onto the queue. > + > +If no buffer is available, the device may either wait for one to be consumed, > +or drop the event. > + > +\begin{lstlisting} > +struct virtio_iommu_fault { > + u8 reason; > + u8 reserved[3]; > + le32 flags; > + le32 endpoint; > + le32 reserved1; > + le64 address; > +}; > + > +#define VIRTIO_IOMMU_FAULT_F_READ (1 << 0) > +#define VIRTIO_IOMMU_FAULT_F_WRITE (1 << 1) > +#define VIRTIO_IOMMU_FAULT_F_EXEC (1 << 2) > +#define VIRTIO_IOMMU_FAULT_F_ADDRESS (1 << 8) > +\end{lstlisting} > + > +\begin{description} > + \item[\field{reason}] The reason for this report. It may have the > + following values: > + \begin{description} > + \item[VIRTIO_IOMMU_FAULT_R_UNKNOWN (0)] An internal error happened, or > + an error that cannot be described with the following reasons. > + \item[VIRTIO_IOMMU_FAULT_R_DOMAIN (1)] The endpoint attempted to > + access \field{address} without being attached to a domain. > + \item[VIRTIO_IOMMU_FAULT_R_MAPPING (2)] The endpoint attempted to > + access \field{address}, which wasn't mapped in the domain or > + didn't have the correct protection flags. > + \end{description} > + \item[\field{flags}] Information about the fault context. > + \item[\field{endpoint}] The endpoint causing the fault. > + \item[\field{reserved} and \field{reserved1}] Should be zero. > + \item[\field{address}] If VIRTIO_IOMMU_FAULT_F_ADDRESS is set, the > + address causing the fault. > +\end{description} > + > +These faults are not recoverable\footnote{This means that the PRI > +extension to PCI, for example, that allows recoverable faults, isn't > +supported for the moment.}. The guest has to do its best to > +prevent any future fault from happening, by stopping or resetting the > +endpoint. > + > +When the fault is reported by a physical IOMMU, the fault reasons may not > +match exactly the reason of the original fault report. The device should > +try its best to find the closest match. > + > +If the device encounters a fault that wasn't caused by a specific > +endpoint, it is unlikely that the driver would be able to do anything else > +than print the fault and stop using the device, so reporting the fault on > +the event queue isn't useful. In that case, we recommend using the > +DEVICE_NEEDS_RESET status bit. > + > +\drivernormative{\paragraph}{Fault reporting}{Device Types / IOMMU Device / Device operations / Fault reporting} > + > +If the \field{reserved} field is not zero, the driver SHOULD ignore the > +fault report.\footnote{A future format may implement events that are not > +faults, which would be differentiated by a type field in place of > +\field{reserved}.} > + > +The driver SHOULD ignore undefined \field{flags}. > + > +If the driver doesn't recognize \field{reason}, it SHOULD treat the fault > +as if it was VIRTIO_IOMMU_FAULT_R_UNKNOWN. > + > +\devicenormative{\paragraph}{Fault reporting}{Device Types / IOMMU Device / Device operations / Fault reporting} > + > +The device SHOULD set \field{reserved} and \field{reserved1} to zero. > + > +The device SHOULD set undefined \field{flags} to zero. > + > +The device SHOULD write a valid endpoint ID in \field{endpoint}. > + > +The device MAY omit setting VIRTIO_IOMMU_FAULT_F_ADDRESS and writing > +\field{address} in any fault report, regardless of the \field{reason}. > + > +If a buffer is too small to contain the fault report\footnotemark, the > +device SHOULD NOT use multiple buffers to describe it. The device MAY fall > +back to using an older fault report format that fits in the buffer. > + > +\footnotetext{This would happen for example if the device implements a > +more recent version of this specification, whose fault report contains > +additional fields.} >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]