[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] Re: [PATCH V2 2/2] virtio-pci: add PASID configuration extended capability
On Thu, Jan 13, 2022 at 09:24:16AM +0800, Jason Wang wrote: > On Wed, Jan 12, 2022 at 6:42 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > On Wed, Jan 12, 2022 at 01:57:55PM +0800, Jason Wang wrote: > > > This patch tries to add PASID configuration structure. It is used for > > > assigning PASID to virtqueue then the device use PASID TLP prefix for > > > the PCI transactions like DMA. The goal is to isolate e.g DMA at > > > subdevice level which could be used for things like: > > > > > > - direct queue assignment to userspace > > > - virtqueue virtualization > > > - presenting multiple vDPA devices within a single PCI device > > > > > > The virtqueue group is introduced as an intermediate layer for having > > > better compatibility and flexibility. The virtqueue group is the > > > minimal set of the virtqueues that can be assigned with a single > > > PASID. The PASID is then assigned at the level of virtqueue group > > > instead of the virtqueue itself. > > > > > > For a full PASID capable hardware, it can simply advertise a model > > > of 1:1 mapping of the virtqueue and virtqueue group. For the device > > > with mediated virtqueues, it can choose to place the mediated > > > virtqueues into dedicated virtqueue group(s). > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > --- > > > content.tex | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++- > > > 1 file changed, 119 insertions(+), 2 deletions(-) > > > > > > diff --git a/content.tex b/content.tex > > > index 00a75f2..552c0a8 100644 > > > --- a/content.tex > > > +++ b/content.tex > > > @@ -1516,8 +1516,14 @@ \subsection{Virtio Structure PCI Extended Capabilities}\label{sec:Virtio Transpo > > > PCI Express extended configuration space. > > > > > > \item[\field{cfg_type}] > > > - Identifies the structure. All values are reserved for future > > > - use. > > > + Identifies the structure. according to the following table: > > > + > > > +\begin{lstlisting} > > > +/* PASID configuration */ > > > +#define VIRTIO_PCI_ECAP_PASID_CFG 1 > > > +\end{lstlisting} > > > + > > > + Any other value is reserved for future use. > > > > > > The device MAY offer more than one structure of any type - this makes it > > > possible for the device to expose multiple interfaces to drivers. The order of > > > @@ -1599,6 +1605,117 @@ \subsection{Virtio Structure PCI Extended Capabilities}\label{sec:Virtio Transpo > > > If the device presents multiple structures of the same type, it SHOULD order > > > them from optimal (first) to least-optimal (last). > > > > > > +\subsubsection{PASID configuration structure layout} > > > + > > > +The PASID configuration structure is found at the \field{bar} and > > > +\field{offset} within the VIRTIO_PCI_ECAP_PASID_CFG extended > > > > It looks like PCI Extended Capabilities have a different namespace from > > classic PCI Capabilities (VIRTIO_PCI_**ECAP**_PASID_CFG). If a device > > runs out of PCI Configuration Space it will be unable to define > > additional capabilities as PCI Extended Capabilities. At least I'm not > > aware of VIRTIO_PCI_ECAP_* definitions for existing capabilities. If the > > point is to help with PCI Configuration Space exhaustion, should all > > capabilities be defined in the same namespace so they can be trivially > > defined as either PCI Capabilities or PCI Extended Capabilities? > > I'm not sure, but only allowing new types to be placed in extended > capabilities is much easier. > > > > > (Maybe I misunderstood the purpose of adding PCI Extended Capabilities?) > > More or less, the proposal is only for the structure that depends on > PCI express. > > > > > > +capability; its layout is below. > > > + > > > +\begin{lstlisting} > > > +struct virtio_pci_common_cfg { > > > > The struct name looks like it was copy-pasted. Should this be struct > > virtio_pci_group_cfg? > > Right, but I think virtio_pci_pasid_cfg is better. Sounds good. > > > > > > + /* About the whole device. */ > > > + le16 groups; /* read-only for driver */ > > > + /* About a specific virtqueue */ > > > + le16 queue_select; /* read-write */ > > > + le16 group_id; /* read-only for driver */ > > > + /* About a specific virtqueue group */ > > > + le16 group_select; /* read-write */ > > > + le32 group_pasid; /* read-write */ > > > +} > > > +\end{lstlisting} > > > + > > > +\begin{description} > > > +\item[\field{groups}] > > > + The device specifies the number of virtqueue groups here. A > > > + virtqueue group is the minimal set of the virtqueues that can > > > + be assigned with a single PASID. > > > + > > > +\item[\field{queue_select}] > > > + Queue Select. The driver selects which virtqueue the > > > + \field{group_id} refers to. > > > + > > > +\item[\field{group_id}] > > > + Virtqueue Group Identifier. The device specify the virtqueue > > > + group identifier that the virtqueue belongs to. > > > + > > > +\item[\field{group_select}] > > > + Virtqueue Group Select. The driver selects which virtqueue > > > + group the field{group_pasid} refers to. > > > + > > > +\item[\field{group_pasid}] > > > + Virtqueue Group PASID. The driver use this field to assign a > > > + PASID to a virtqueue group. > > > +\end{description} > > > > The queue_select/group_select interface makes me wonder if an admin > > virtqueue interface would be better. The driver is forced to scan > > virtqueues and groups, and each access requires a vmexit. An admin > > virtqueue interface would allow a more efficient DMA interface where all > > virtqueues/groups can be queried or set in a single operation. > > > > Luckily this interface is only used at initialization time, but I > > sometimes still wonder if it's scalable enough. > > Yes, so it should be no worse than the current layout in the > virtio_pci_common_cfg where we had a bunch of fields that are > multiplexed via the queue_select. > > Actually the admin virtqueue and capability are not mutually > exclusive. E.g we can have admin virtqueue in L0 and capability in > L1/L0. The capability make it much more easier to be virtualized for > future vPASID/vSVA stuffs. > > > > > > + > > > +\paragraph{PASID Configuration} > > > + > > > +When PASID extended capability is present and enabled in the device > > > +(through the standard PCI Extended configuration space). To assign a > > > +PASID to a specific virtqueue group. The driver need to perform the > > > +following steps: > > > > This paragraph is a single sentence, the full stops ('.') are used more > > like commas (','). I suggest shortening it like this: > > > > When the PCI PASID Extended Capability is present and enabled, the > > driver performs the following steps to assign a PASID to a virtqueue > > group: > > > > (I also used explicit "PCI PASID Extended Capability" naming so it's > > clear we're talking about a formal term defined in the PCI > > specification.) > > That's fine. > > > > > > + > > > +\begin{enumerate} > > > +\item Identify the number of groups by reading \field{groups} > > > + > > > +\item Identify the virtqueue group for a specific virtqueue by first > > > + writing the index of the virtqueue to \field{queue_select} then > > > + reading its virtqueue group identifier from \field{group_id}. > > > + > > > +\item Determine number of PASIDs and the PASID that will be associated > > > + for each virtqueue group. > > > + > > > +\item Assign PASID to virtqueue groups by first writing the > > > + virtqueue group ID to \field{group_select} then writing the valid > > > + PASID number that assigned to the virtqueue group to > > > + \field{group_pasid} > > > > \item Assign PASIDs to virtqueue groups by first writing the > > virtqueue group ID to \field{group_select} and then writing a valid > > PASID number to assign to that group to \field{group_pasid}. > > Ok. > > > > > > +\end{enumerate} > > > + > > > +To disable PASID for a virtqueue group, the driver write a special > > > +NO_PASID value: > > > > The driver writes the special VIRTIO_PASID_NO_ID value to disable > > PASID for a virtqueue group: > > Ok. > > > > > > + > > > +\begin{lstlisting} > > > +/* PASID value used to disable PASID for virtqueue group */ > > > +#define VIRTIO_PASID_NO_ID 0xffffffff > > > +\end{lstlisting} > > > + > > > +\drivernormative{\subsubsection}{PASID configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PASID configuration structure layout} > > > + > > > +Except for VIRTIO_PASID_NO_ID, a driver MUST NOT attempt to assign a > > > +PASID number outside the range that is defined in the \field{Max PASID Width} > > > +in the PASID Extended Capability. > > > + > > > +After assigning a PASID to a virtqueue group, the driver MUST verify > > > +the success by reading the Virtqueue Group PASID: on success, the > > > +previously written value is returned, and on failure > > > +VIRTIO_PASID_NO_ID is returned. If a mapping failure is detected, the > > > +driver MAY retry assigning with fewer PASIDs, disable PASID or report > > > +device failure. > > > + > > > +\devicenormative{\subsubsection}{PASID configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PASID configuration structure layout} > > > + > > > +The device MUST support at least one virtqueue group if it offers > > > +PASID configuration structure. > > > > the PASID configuration structure. > > > > ("a" would be okay too but leaves open the possibility of multiple PASID > > configuration structures, which I think we don't want.) > > Yes, but multiple structures of the same type are allowed for PCI > capability, so I try to allow it for PCIe as well. I see. The line can be changed to "a PASID configuration structure". > > > +wrote to the drvier in \field{group_pasid}. > > > > This is ambiguous. It leaves open the possibility that group_pasid > > updates at a later point in time instead of immediately after a write by > > the driver. My interpretation is that the device must respond > > immediately by presenting the new PASID or VIRTIO_PASID_NO_ID when the > > driver reads back group_pasid after a write. If you agree, please > > rephrase this clause to make it clear that reading group_pasid after > > write acts like a barrier (previous PCIe transactions must complete > > before the device responds to the read) and the device is not allowed to > > present the old group_pasid value until all transactions are completed > > and then present the new group_pasid value. > > So my understanding is that instead of delaying the response to read > it's better to simply present the previous PASID value if there're any > pending transactions of previous PASID value. The driver can then > choose to poll or using timers etc. Please state that explicitly. It wasn't obvious to me that the driver needs to reread group_pasid until it changes either to the new value or the VIRTIO_PASID_NO_ID. I thought the field updates immediately. > > > > > > + > > > +The device MUST use PASID TLP prefix for all memory transactions > > > +belongs to a virtqueue group if a valid PASID is assigned and PASID is > > > > s/belongs/belonging/ > > > > > +enabled in the PASID extended capability. > > > > This statement is vague. Which memory transactions "belong to a > > virtqueue group"? > > How about: > > The device MUST use PASID TLP prefix for all memory transactions > initiated by the virtqueue that belong to a virtqueue group ... Okay. > > vring reads/writes and data buffer reads/writes come > > to mind. Virtqueue MSI-X messages are probably not included? Anything > > else? > > According to the PCIe spec and the semantic, the PASID TLP prefix > should be used for > > Memory Requests (including AtomicOp Requests) with Untranslated Addresses. > Address Translation Requests, ATS Invalidation Messages, Page Request > Messages, and PRG Response Messages > > So we probably don't need to differentiate MSI-X messages with DMA > here. That's why I think a general "memory transaction" should be > sufficient here. If you don't agree, I can clarify it as what PCIe > spec did. Okay, thanks for explaining! Stefan
Attachment:
signature.asc
Description: PGP signature
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]