OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [PATCH RESEND v2] vsock: add vsock device


The virtio vsock device is a zero-configuration socket communications
device.  It is designed as a guest<->host management channel suitable
for communicating with guest agents.

vsock is designed with the sockets API in mind and the driver is
typically implemented as an address family (at the same level as
AF_INET).  Applications written for the sockets API can be ported with
minimal changes (similar amount of effort as adding IPv6 support to an
IPv4 application).

Unlike the existing console device, which is also used for guest<->host
communication, multiple clients can connect to a server at the same time
over vsock.  This limitation requires console-based users to arbitrate
access through a single client.  In vsock they can connect directly and
do not have to synchronize with each other.

Unlike network devices, no configuration is necessary because the device
comes with its address in the configuration space.

The vsock device was prototyped by Gerd Hoffmann and Asias He.  I picked
the code and design up from them.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Asias He <asias.hejun@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2:
 * Document guest_cid field
 * Use MAY/MUST/CAN according to RFC 2119
 * Remove datagram socket type for the time being.  This can be added in
   the future but there are currently no applications.
 * Drop 3-way handshake for stream sockets.  It is not needed since
   virtio-vsock is reliable, in-order delivery and spoofing source
   addresses is impossible.
 * Drop max_virtqueue_pairs configuration space field.  This field was
   never defined and Linux code does not support multiqueue.  It can be
   added back later, if necessary.
---
 trunk/content.tex | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 152 insertions(+)

diff --git a/trunk/content.tex b/trunk/content.tex
index d989d98..8b5b520 100644
--- a/trunk/content.tex
+++ b/trunk/content.tex
@@ -5641,6 +5641,158 @@ descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\section{VSock Device}\label{sec:Device Types / VSock Device}
+
+The virtio vsock device is a zero-configuration socket communications device.
+It facilitates data transfer between the guest and device without using the
+Ethernet or IP protocols.
+
+\subsection{Device ID}\label{sec:Device Types / VSock Device / Device ID}
+  13
+
+\subsection{Virtqueues}\label{sec:Device Types / VSock Device / Virtqueues}
+\begin{description}
+\item[0] ctrl
+\item[1] rx
+\item[2] tx
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / VSock Device / Feature bits}
+
+\begin{description}
+There are currently no feature bits defined for this device.
+\end{description}
+
+\subsection{Device configuration layout}\label{sec:Device Types / VSock Device / Device configuration layout}
+
+\begin{lstlisting}
+struct virtio_vsock_config {
+	__le32 guest_cid;
+};
+\end{lstlisting}
+
+The \field{guest_cid} field contains the guest's context ID, which uniquely
+identifies the guest for the lifetime of the device.  The value MUST be used as
+the source CID when sending outgoing packets.
+
+\subsection{Device Initialization}\label{sec:Device Types / VSock Device / Device Initialization}
+
+\begin{enumerate}
+\item The guest's cid is read from \field{guest_cid}.
+
+\item Buffers are added to the rx virtqueue to start receiving packets.
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / VSock Device / Device Operation}
+
+Packets transmitted or received contain a header before the payload:
+
+\begin{lstlisting}
+struct virtio_vsock_hdr {
+	__le32 src_cid;
+	__le32 src_port;
+	__le32 dst_cid;
+	__le32 dst_port;
+	__le32 len;
+	__le16 type;
+	__le16 op;
+	__le32 flags;
+	__le32 buf_alloc;
+	__le32 fwd_cnt;
+};
+\end{lstlisting}
+
+Most packets simply transfer data but control packets are also used for
+connection and buffer space management.  \field{op} is one of the following
+operation constants:
+
+\begin{lstlisting}
+enum {
+	VIRTIO_VSOCK_OP_INVALID = 0,
+
+	/* Connect operations */
+	VIRTIO_VSOCK_OP_REQUEST = 1,
+	VIRTIO_VSOCK_OP_RESPONSE = 2,
+	VIRTIO_VSOCK_OP_RST = 3,
+	VIRTIO_VSOCK_OP_SHUTDOWN = 4,
+
+	/* To send payload */
+	VIRTIO_VSOCK_OP_RW = 5,
+
+	/* Tell the peer our credit info */
+	VIRTIO_VSOCK_OP_CREDIT_UPDATE = 6,
+	/* Request the peer to send the credit info to us */
+	VIRTIO_VSOCK_OP_CREDIT_REQUEST = 7,
+};
+\end{lstlisting}
+
+\subsubsection{Addressing}\label{sec:Device Types / VSock Device / Device Operation / Addressing}
+
+VSock flows are identified by a (source, destination) address tuple. Address
+information consists of a (cid, port number) tuple. The header fields used for
+this are \field{src_cid}, \field{src_port}, \field{dst_cid}, and
+\field{dst_port}.
+
+Currently only stream sockets are supported. \field{type} is 1 for stream
+socket types.  A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received
+with an unknown \field{type} value.
+
+Stream sockets provide in-order, guaranteed, connection-oriented delivery
+without message boundaries.
+
+\subsubsection{Buffer Space Management}\label{sec:Device Types / VSock Device / Device Operation / Buffer Space Management}
+\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
+stream sockets.  The guest and the device MUST publish how much buffer space is
+available per socket. This facilitates flow control so packets are never
+dropped.
+
+\field{buf_alloc} is the total receive buffer space, in bytes, for this socket.
+This includes both free and in-use buffers. \field{fwd_cnt} is the free-running
+bytes received counter. The sender calculates the amount of free receive buffer
+space as follows:
+
+\begin{lstlisting}
+/* tx_cnt is the sender's free-running bytes transmitted counter */
+u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
+\end{lstlisting}
+
+If there is insufficient buffer space, the sender MUST wait until virtqueue
+buffers are returned and check \field{buf_alloc} and \field{fwd_cnt} again. The
+VIRTIO_VSOCK_OP_CREDIT_REQUEST packet MAY be sent to force buffer space
+management information exchange. VIRTIO_VSOCK_OP_CREDIT_UPDATE MUST be sent in
+response and when buffer space is freed.
+
+\subsubsection{Receive and Transmit}\label{sec:Device Types / VSock Device / Device Operation / Receive and Transmit}
+The driver queues outgoing packets on the tx virtqueue and incoming packet
+receive buffers on the rx virtqueue. Packets are of the following form:
+
+\begin{lstlisting}
+struct virtio_vsock_packet {
+    struct virtio_vsock_hdr hdr;
+    u8 data[];
+};
+\end{lstlisting}
+
+Virtqueue buffers for outgoing packets are read-only. Virtqueue buffers for
+incoming packets are write-only.
+
+\subsubsection{Stream Sockets}\label{sec:Device Types / VSock Device / Device Operation / Stream Sockets}
+
+Connections are established by sending a VIRTIO_VSOCK_OP_REQUEST packet. If a
+listening socket exists on the destination a VIRTIO_VSOCK_OP_RESPONSE reply is
+sent and the connection is established.  A VIRTIO_VSOCK_OP_RST reply is sent if
+a listening socket does not exist on the destination or the destination has
+insufficient resources to establish the connection.
+
+When a connected socket receives VIRTIO_VSOCK_OP_SHUTDOWN the header
+\field{flags} field bit 0 indicates that the peer will not receive any more
+data and bit 1 indicates that the peer will not send any more data. If these
+bits are set and there are no more virtqueue buffers pending the socket is
+disconnected.
+
+VIRTIO_VSOCK_OP_RST can be sent at any time to abort the connection process or
+forcibly disconnect.
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 Currently there are three device-independent feature bits defined:
-- 
2.5.0



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]