OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [PATCH v3] vsock: add vsock device


The virtio vsock device is a zero-configuration socket communications
device.  It is designed as a guest<->host management channel suitable
for communicating with guest agents.

vsock is designed with the sockets API in mind and the driver is
typically implemented as an address family (at the same level as
AF_INET).  Applications written for the sockets API can be ported with
minimal changes (similar amount of effort as adding IPv6 support to an
IPv4 application).

Unlike the existing console device, which is also used for guest<->host
communication, multiple clients can connect to a server at the same time
over vsock.  This limitation requires console-based users to arbitrate
access through a single client.  In vsock they can connect directly and
do not have to synchronize with each other.

Unlike network devices, no configuration is necessary because the device
comes with its address in the configuration space.

The vsock device was prototyped by Gerd Hoffmann and Asias He.  I picked
the code and design up from them.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Asias He <asias.hejun@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v3:
 * "VSock device" -> "Virtio socket device" in free text [Michael]
 * Extract normative statements and add references from conformance
   chapter [Michael]
v2:
 * Document guest_cid field
 * Use MAY/MUST/CAN according to RFC 2119
 * Remove datagram socket type for the time being.  This can be added in
   the future but there are currently no applications.
 * Drop 3-way handshake for stream sockets.  It is not needed since
   virtio-vsock is reliable, in-order delivery and spoofing source
   addresses is impossible.
 * Drop max_virtqueue_pairs configuration space field.  This field was
   never defined and Linux code does not support multiqueue.  It can be
   added back later, if necessary.
---
 trunk/conformance.tex |  18 +++++-
 trunk/content.tex     | 176 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 192 insertions(+), 2 deletions(-)

diff --git a/trunk/conformance.tex b/trunk/conformance.tex
index 7b7df32..678fe0b 100644
--- a/trunk/conformance.tex
+++ b/trunk/conformance.tex
@@ -15,13 +15,13 @@ Conformance targets:
   \begin{itemize}
     \item Clause \ref{sec:Conformance / Driver Conformance},
     \item One of clauses \ref{sec:Conformance / Driver Conformance / PCI Driver Conformance}, \ref{sec:Conformance / Driver Conformance / MMIO Driver Conformance} or \ref{sec:Conformance / Driver Conformance / Channel I/O Driver Conformance}.
-    \item One of clauses \ref{sec:Conformance / Driver Conformance / Network Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Block Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Console Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Entropy Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Traditional Memory Balloon Driver Conformance} or \ref{sec:Conformance / Driver Conformance / SCSI Host Driver Conformance}.
+    \item One of clauses \ref{sec:Conformance / Driver Conformance / Network Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Block Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Console Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Entropy Driver Conformance}, \ref{sec:Conformance / Driver Conformance / Traditional Memory Balloon Driver Conformance}, \ref{sec:Conformance / Driver Conformance / SCSI Host Driver Conformance} or \ref{sec:Conformance / Driver Conformance / Socket Driver Conformance}.
   \end{itemize}
 \item[Device] A device MUST conform to three conformance clauses:
   \begin{itemize}
     \item Clause \ref{sec:Conformance / Device Conformance},
     \item One of clauses \ref{sec:Conformance / Device Conformance / PCI Device Conformance}, \ref{sec:Conformance / Device Conformance / MMIO Device Conformance} or \ref{sec:Conformance / Device Conformance / Channel I/O Device Conformance}.
-    \item One of clauses \ref{sec:Conformance / Device Conformance / Network Device Conformance}, \ref{sec:Conformance / Device Conformance / Block Device Conformance}, \ref{sec:Conformance / Device Conformance / Console Device Conformance}, \ref{sec:Conformance / Device Conformance / Entropy Device Conformance}, \ref{sec:Conformance / Device Conformance / Traditional Memory Balloon Device Conformance} or \ref{sec:Conformance / Device Conformance / SCSI Host Device Conformance}.
+    \item One of clauses \ref{sec:Conformance / Device Conformance / Network Device Conformance}, \ref{sec:Conformance / Device Conformance / Block Device Conformance}, \ref{sec:Conformance / Device Conformance / Console Device Conformance}, \ref{sec:Conformance / Device Conformance / Entropy Device Conformance}, \ref{sec:Conformance / Device Conformance / Traditional Memory Balloon Device Conformance}, \ref{sec:Conformance / Device Conformance / SCSI Host Device Conformance} or \ref{sec:Conformance / Device Conformance / Socket Device Conformance}.
   \end{itemize}
 \end{description}
 
@@ -145,6 +145,13 @@ An SCSI host driver MUST conform to the following normative statements:
 \item \ref{drivernormative:Device Types / SCSI Host Device / Device Operation / Device Operation: eventq}
 \end{itemize}
 
+A socket driver MUST conform to the following normative statements:
+
+\begin{itemize}
+\item \ref{drivernormative:Device Types / Socket Device / Device Operation / Buffer Space Management}
+\item \ref{drivernormative:Device Types / Socket Device / Device Operation / Receive and Transmit}
+\end{itemize}
+
 \section{Device Conformance}\label{sec:Conformance / Device Conformance}
 
 A device MUST conform to the following normative statements:
@@ -265,6 +272,13 @@ An SCSI host device MUST conform to the following normative statements:
 \item \ref{devicenormative:Device Types / SCSI Host Device / Device Operation / Device Operation: eventq}
 \end{itemize}
 
+A socket device MUST conform to the following normative statements:
+
+\begin{itemize}
+\item \ref{devicenormative:Device Types / Socket Device / Device Operation / Buffer Space Management}
+\item \ref{devicenormative:Device Types / Socket Device / Device Operation / Receive and Transmit}
+\end{itemize}
+
 \section{Legacy Interface: Transitional Device and
 Transitional Driver Conformance}\label{sec:Conformance / Legacy
 Interface: Transitional Device and 
diff --git a/trunk/content.tex b/trunk/content.tex
index d989d98..f500578 100644
--- a/trunk/content.tex
+++ b/trunk/content.tex
@@ -5641,6 +5641,182 @@ descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\section{Socket Device}\label{sec:Device Types / Socket Device}
+
+The virtio socket device is a zero-configuration socket communications device.
+It facilitates data transfer between the guest and device without using the
+Ethernet or IP protocols.
+
+\subsection{Device ID}\label{sec:Device Types / Socket Device / Device ID}
+  13
+
+\subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues}
+\begin{description}
+\item[0] ctrl
+\item[1] rx
+\item[2] tx
+\end{description}
+
+The ctrl virtqueue is reserved for future use and is currently unused.
+
+\subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
+
+\begin{description}
+There are currently no feature bits defined for this device.
+\end{description}
+
+\subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout}
+
+\begin{lstlisting}
+struct virtio_vsock_config {
+	__le32 guest_cid;
+};
+\end{lstlisting}
+
+The \field{guest_cid} field contains the guest's context ID, which uniquely
+identifies the device for its lifetime.
+
+\subsection{Device Initialization}\label{sec:Device Types / Socket Device / Device Initialization}
+
+\begin{enumerate}
+\item The guest's cid is read from \field{guest_cid}.
+
+\item Buffers are added to the rx virtqueue to start receiving packets.
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Operation}
+
+Packets transmitted or received contain a header before the payload:
+
+\begin{lstlisting}
+struct virtio_vsock_hdr {
+	__le32 src_cid;
+	__le32 src_port;
+	__le32 dst_cid;
+	__le32 dst_port;
+	__le32 len;
+	__le16 type;
+	__le16 op;
+	__le32 flags;
+	__le32 buf_alloc;
+	__le32 fwd_cnt;
+};
+\end{lstlisting}
+
+Most packets simply transfer data but control packets are also used for
+connection and buffer space management.  \field{op} is one of the following
+operation constants:
+
+\begin{lstlisting}
+enum {
+	VIRTIO_VSOCK_OP_INVALID = 0,
+
+	/* Connect operations */
+	VIRTIO_VSOCK_OP_REQUEST = 1,
+	VIRTIO_VSOCK_OP_RESPONSE = 2,
+	VIRTIO_VSOCK_OP_RST = 3,
+	VIRTIO_VSOCK_OP_SHUTDOWN = 4,
+
+	/* To send payload */
+	VIRTIO_VSOCK_OP_RW = 5,
+
+	/* Tell the peer our credit info */
+	VIRTIO_VSOCK_OP_CREDIT_UPDATE = 6,
+	/* Request the peer to send the credit info to us */
+	VIRTIO_VSOCK_OP_CREDIT_REQUEST = 7,
+};
+\end{lstlisting}
+
+\subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Operation / Addressing}
+
+Flows are identified by a (source, destination) address tuple.  An address
+consists of a (cid, port number) tuple. The header fields used for this are
+\field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
+
+Currently only stream sockets are supported. \field{type} is 1 for stream
+socket types.
+
+Stream sockets provide in-order, guaranteed, connection-oriented delivery
+without message boundaries.
+
+\subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
+\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
+stream sockets. The guest and the device publish how much buffer space is
+available per socket. This facilitates flow control so packets are never
+dropped.
+
+\field{buf_alloc} is the total receive buffer space, in bytes, for this socket.
+This includes both free and in-use buffers. \field{fwd_cnt} is the free-running
+bytes received counter. The sender calculates the amount of free receive buffer
+space as follows:
+
+\begin{lstlisting}
+/* tx_cnt is the sender's free-running bytes transmitted counter */
+u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
+\end{lstlisting}
+
+If there is insufficient buffer space, the sender waits until virtqueue buffers
+are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending
+the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is
+available. The reply to this query is a VIRTIO_VSOCK_OP_CREDIT_UPDATE packet.
+
+\drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
+VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
+sufficient free buffer space for the payload.
+
+All packets associated with a stream flow MUST contain valid information in
+\field{buf_alloc} and \field{fwd_cnt} fields.
+
+\devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
+VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
+sufficient free buffer space for the payload.
+
+All packets associated with a stream flow MUST contain valid information in
+\field{buf_alloc} and \field{fwd_cnt} fields.
+
+\subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / Device Operation / Receive and Transmit}
+The driver queues outgoing packets on the tx virtqueue and incoming packet
+receive buffers on the rx virtqueue. Packets are of the following form:
+
+\begin{lstlisting}
+struct virtio_vsock_packet {
+    struct virtio_vsock_hdr hdr;
+    u8 data[];
+};
+\end{lstlisting}
+
+Virtqueue buffers for outgoing packets are read-only. Virtqueue buffers for
+incoming packets are write-only.
+
+\drivernormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
+
+The \field{guest_cid} configuration field MUST be used as the source CID when
+sending outgoing packets.
+
+A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
+unknown \field{type} value.
+
+\devicenormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
+A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
+unknown \field{type} value.
+
+\subsubsection{Stream Sockets}\label{sec:Device Types / Socket Device / Device Operation / Stream Sockets}
+
+Connections are established by sending a VIRTIO_VSOCK_OP_REQUEST packet. If a
+listening socket exists on the destination a VIRTIO_VSOCK_OP_RESPONSE reply is
+sent and the connection is established.  A VIRTIO_VSOCK_OP_RST reply is sent if
+a listening socket does not exist on the destination or the destination has
+insufficient resources to establish the connection.
+
+When a connected socket receives VIRTIO_VSOCK_OP_SHUTDOWN the header
+\field{flags} field bit 0 indicates that the peer will not receive any more
+data and bit 1 indicates that the peer will not send any more data. If these
+bits are set and there are no more virtqueue buffers pending the socket is
+disconnected.
+
+The VIRTIO_VSOCK_OP_RST packet aborts the connection process or forcibly
+disconnects a connected socket.
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 Currently there are three device-independent feature bits defined:
-- 
2.5.0



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]