OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [virtio-dev] [PATCH v7 08/11] packed virtqueues: more efficient virtqueue layout

On Tue, Jan 23, 2018 at 02:01:07AM +0200, Michael S. Tsirkin wrote:
+With current transports, virtqueues are located in guest memory
+allocated by driver.
+Each packed virtqueue consists of three parts:
+\item Descriptor Ring - occupies the Descriptor Area
+\item Driver Event Suppression - occupies the Driver Area
+\item Device Event Suppression - occupies the Device Area
+Where Descriptor Ring in turn consists of descriptors,
+and where each descriptor can contain the following parts:
+\item Buffer ID
+\item Buffer Address
+\item Buffer Length
+\item Flags
+A buffer consists of zero or more device-readable physically-contiguous
+elements followed by zero or more physically-contiguous
+device-writable elements (each buffer has at least one element).
+When the driver wants to send such a buffer to the device, it
+writes at least one available descriptor describing elements of
+the buffer into the Descriptor Ring.  The descriptor(s) are
+associated with a buffer by means of a Buffer ID stored within
+the descriptor.
+Driver then notifies the device. When the device has finished
+processing the buffer, it writes a used device descriptor
+including the Buffer ID into the Descriptor Ring (overwriting a
+driver descriptor previously made available), and sends an
+Descriptor Ring is used in a circular manner: driver writes
+descriptors into the ring in order. After reaching end of ring,
+the next descriptor is placed at head of the ring.  Once ring is
+full of driver descriptors, driver stops sending new requests and
+waits for device to start processing descriptors and to write out
+some used descriptors before making new driver descriptors
+Similarly, device reads descriptors from the ring in order and
+detects that a driver descriptor has been made available.  As
+processing of descriptors is completed used descriptors are
+written by the device back into the ring.
+Note: after reading driver descriptors and starting their
+processing in order, device might complete their processing out
+of order.  Used device descriptors are written in the order
+in which their processing is complete.
+Device Event Suppression data structure is write-only by the
+device. It includes information for reducing the number of
+device events - i.e. driver notifications to device.
+Driver Event Suppression data structure is read-only by the
+device. It includes information for reducing the number of
+driver events - i.e. device interrupts to driver.
+\subsection{Available and Used Ring Wrap Counters}
+\label{sec:Packed Virtqueues / Available and Used Ring Wrap Counters}
+Each of the driver and the device are expected to maintain,
+internally, a single-bit ring wrap counter initialized to 1.
+The counter maintained by the driver is called the Available
+Ring Wrap Counter. Driver changes the value of this counter
+each time it makes available the
+last descriptor in the ring (after making the last descriptor
+The counter maintained by the device is called the Used Ring Wrap
+Counter.  Device changes the value of this counter
+each time it uses the last descriptor in
+the ring (after marking the last descriptor used).
+It is easy to see that the Available Ring Wrap Counter in the driver matches
+the Used Ring Wrap Counter in the device when both are processing the same
+descriptor, or when all available descriptors have been used.
+To mark a descriptor as available and used, both driver and
+device use the following two flags:
+#define VIRTQ_DESC_F_AVAIL     7
+#define VIRTQ_DESC_F_USED      15
+To mark a descriptor as available, driver sets the
+VIRTQ_DESC_F_AVAIL bit in Flags to match the internal Available
+Ring Wrap Counter.  It also sets the VIRTQ_DESC_F_USED bit to match the
+\emph{inverse} value.
+To mark a descriptor as used, device sets the
+VIRTQ_DESC_F_USED bit in Flags to match the internal Used
+Ring Wrap Counter.  It also sets the VIRTQ_DESC_F_AVAIL bit to match the
+\emph{same} value.
+Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different
+for an available descriptor and equal for a used descriptor.
+\subsection{Polling of available and used descriptors}
+\label{sec:Packed Virtqueues / Polling of available and used descriptors}
+Writes of device and driver descriptors can generally be
+reordered, but each side (driver and device) are only required to
+poll (or test) a single location in memory: next device descriptor after
+the one they processed previously, in circular order.
+Sometimes device needs to only write out a single used descriptor
+after processing a batch of multiple available descriptors.  As
+described in more detail below, this can happen when using
+descriptor chaining or with in-order
+use of descriptors.  In this case, device writes out a used
+descriptor with buffer id of the last descriptor in the group.
+After processing the used descriptor, both device and driver then
+skip forward in the ring the number of the remaining descriptors
+in the group until processing (reading for the driver and writing
+for the device) the next used descriptor.
+\subsection{Write Flag}
+\label{sec:Packed Virtqueues / Write Flag}
+In an available descriptor, VIRTQ_DESC_F_WRITE bit within Flags
+is used to mark a descriptor as corresponding to a write-only or
+read-only element of a buffer.
+/* This marks a buffer as device write-only (otherwise device read-only). */
+#define VIRTQ_DESC_F_WRITE     2
+In a used descriptor, this bit it used to specify whether any
+data has been written by the device into any parts of the buffer.
+\subsection{Buffer Address and Length}
+\label{sec:Packed Virtqueues / Buffer Address and Length}
+In an available descriptor, Buffer Address corresponds to the
+physical address of the buffer. The length of the buffer assumed
+to be physically contigious is stored in Buffer Length.
+In a used descriptor, Buffer Address is unused. Buffer Length
+specifies the length of the buffer that has been initialized
+(written to) by the device.
+Buffer length is reserved for used descriptors without the
+VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
+\subsection{Scatter-Gather Support}
+\label{sec:Packed Virtqueues / Scatter-Gather Support}
+Some drivers need an ability to supply a list of multiple buffer
+elements (also known as a scatter/gather list) with a request.
+Two optional features support this: descriptor
+chaining and indirect descriptors.
+If neither feature has been negotiated, each buffer is
+physically-contigious, either read-only or write-only and is
+described completely by a single descriptor.
+While unusual (most implementations either create all lists
+solely using non-indirect descriptors, or always use a single
+indirect element), if both features have been negotiated, mixing
+direct and direct descriptors in a ring is valid, as long as each
+list only contains descriptors of a given type.
+Scatter/gather lists only apply to available descriptors. A
+single used descriptor corresponds to the whole list.
+The device limits the number of descriptors in a list through a
+transport-specific and/or device-specific value. If not limited,
+the maximum number of descriptors in a list is the virt queue
+\subsection{Next Flag: Descriptor Chaining}
+\label{sec:Packed Virtqueues / Next Flag: Descriptor Chaining}
+The VIRTIO_F_LIST_DESC feature allows driver to supply
+a scatter/gather list to the device
+by using multiple descriptors, and setting the VIRTQ_DESC_F_NEXT in
+Flags for all but the last available descriptor.
+/* This marks a buffer as continuing. */
+#define VIRTQ_DESC_F_NEXT   1
+Buffer ID is included in the last descriptor in the list.
+The driver always makes the the first descriptor in the list
+available after the rest of the list has been written out into
+the ring. This guarantees that the device will never observe a
+partial scatter/gather list in the ring.
+Device only writes out a single used descriptor for the whole
+list. It then skips forward according to the number of
+descriptors in the list. Driver needs to keep track of the size
+of the list corresponding to each buffer ID, to be able to skip
+to where the next used descriptor is written by the device.
+For example, if descriptors are used in the same order in which
+they are made available, this will result in the used descriptor
+overwriting the first available descriptor in the list, the used
+descriptor for the next list overwriting the first available
+descriptor in the next list, etc.
+VIRTQ_DESC_F_NEXT is reserved in used descriptors, and
+should be ignored by drivers.
+\subsection{Indirect Flag: Scatter-Gather Support}
+\label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
+Some devices benefit by concurrently dispatching a large number
+of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
+ring capacity the driver can store a (read-only by the device) table of indirect
+descriptors anywhere in memory, and insert a descriptor in main
+virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
+a memory buffer
+containing this indirect descriptor table; \field{addr} and \field{len}
+refer to the indirect table address and length in bytes,
+/* This means the buffer contains a table of buffer descriptors. */
+The indirect table layout structure looks like this
+(\field{len} is the Buffer Length of the descriptor that refers to this table,
+which is a variable, so this code won't compile):

It is pseudo-code, so I'm not sure if this remark is necessary.
+struct indirect_descriptor_table {
+        /* The actual descriptor structures (struct Desc each) */
+        struct Desc desc[len / sizeof(struct Desc)];
+The first descriptor is located at start of the indirect
+descriptor table, additional indirect descriptors come
+immediately afterwards. \field{Flags} bit VIRTQ_DESC_F_WRITE is the
+only valid flag for descriptors in the indirect table. Others
+are reserved and are ignored by the device.
+Buffer ID is also reserved and is ignored by the device.
+is reserved and is ignored by the device.
+\subsection{Multi-buffer requests}
+\label{sec:Packed Virtqueues / Multi-descriptor batches}
+Some devices combine multiple buffers as part of processing of a
+single request.  These devices always make the first
+descriptor in the request available after the rest of the request

maybe I don't understand it correctly, but how about "mark the first
descriptor as used after the rest.."

+has been written out request the ring.

I can parse this sentence. Should probably be "written out to the


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]