OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [RFC] vsock: add vsock device


[Resent because my personal email address was bounced by the virtio-dev mailing
list and Andy King's email address no longer works.]

The virtio vsock device is a zero-configuration datagram and stream
socket communications device.  It is designed as a guest<->host
management channel suitable for communicating with guest agents or host
services.

vsock is designed with the sockets API in mind and the driver is
typically implemented as an address family (at the same level as
AF_INET).  Applications written for the sockets API can be ported with
minimal changes (similar amount of effort as adding IPv6 support to an
IPv4 application).

Unlike the existing console device, which is also used for guest<->host
communication, multiple clients can connect to a server at the same time
over vsock.  This limitation requires console-based users to arbitrate
access through a single client.  In vsock they can connect directly and
do not have to synchronize with each other.

Unlike network devices, no configuration is necessary because the device
comes with its address in the configuration space.

The vsock device was prototyped by Gerd Hoffmann and Asias He.  I
recently picked it up again.

Please take a look at this design.  I'd be happy to flesh it out and
answer any questions you may have.

Open questions:

 * Denial of service scenarios?  Competing flows use the same rx/tx
   virtqueue.  This design is similar to network cards so it should not
   pose a big problem as long as the driver and guest copy data out of
   the ring as soon as possible instead of waiting for applications
   before reclaiming virtqueue buffers.

 * Can stream sockets be simplified?  They mostly ape TCP but I'm not
   sure whether the connection establishment and shutdown needs to be as
   elaborate.

 * Multiqueue?  (Probably not critical, can be added with a feature bit
   later if needed.)

I will post patches for the Linux kernel in the coming week.  They
consist of a virtio guest driver and vhost host driver that use the
AF_VSOCK address family already available in net/vmw_vsock/.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Asias He <asias.hejun@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 trunk/content.tex | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 172 insertions(+)

diff --git a/trunk/content.tex b/trunk/content.tex
index 1efdcc8..f03ffff 100644
--- a/trunk/content.tex
+++ b/trunk/content.tex
@@ -5146,6 +5146,178 @@ descriptor for the \field{sense_len}, \field{residual},
 \field{status_qualifier}, \field{status}, \field{response} and
 \field{sense} fields.
 
+\section{VSock Device}\label{sec:Device Types / VSock Device}
+
+The virtio vsock device is a zero-configuration datagram and stream socket
+communications device. It facilitates data transfer between the guest and
+device without using the Ethernet or IP protocols.
+
+\subsection{Device ID}\label{sec:Device Types / VSock Device / Device ID}
+  13
+
+\subsection{Virtqueues}\label{sec:Device Types / VSock Device / Virtqueues}
+\begin{description}
+\item[0] ctrl
+\item[1] rx
+\item[2] tx
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / VSock Device / Feature bits}
+
+\begin{description}
+There are currently no feature bits defined for this device.
+\end{description}
+
+\subsection{Device configuration layout}\label{sec:Device Types / VSock Device / Device configuration layout}
+
+TODO drop max_virtqueue_pairs for now?
+
+\begin{lstlisting}
+struct virtio_vsock_config {
+	__le32 guest_cid;
+	__le32 max_virtqueue_pairs;
+};
+\end{lstlisting}
+
+
+\subsection{Device Initialization}\label{sec:Device Types / VSock Device / Device Initialization}
+
+\begin{enumerate}
+\item The guest's cid can be read from \field{guest_cid}.
+
+\item Buffers must be added to the rx virtqueue to start receiving packets.
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / VSock Device / Device Operation}
+
+Packets transmitted or received contain a header before the payload:
+
+\begin{lstlisting}
+struct virtio_vsock_hdr {
+	__le32 src_cid;
+	__le32 src_port;
+	__le32 dst_cid;
+	__le32 dst_port;
+	__le32 len;
+	__le16 type;
+	__le16 op;
+	__le32 flags;
+	__le32 buf_alloc;
+	__le32 fwd_cnt;
+};
+\end{lstlisting}
+
+Most packets simply transfer data but control packets are also used for
+connection and buffer space management.  \field{op} is one of the following
+operation constants:
+
+\begin{lstlisting}
+enum {
+	VIRTIO_VSOCK_OP_INVALID = 0,
+
+	/* Connect operations */
+	VIRTIO_VSOCK_OP_REQUEST = 1,
+	VIRTIO_VSOCK_OP_RESPONSE = 2,
+	VIRTIO_VSOCK_OP_ACK = 3,
+	VIRTIO_VSOCK_OP_RST = 4,
+	VIRTIO_VSOCK_OP_SHUTDOWN = 5,
+
+	/* To send payload */
+	VIRTIO_VSOCK_OP_RW = 6,
+
+	/* Tell the peer our credit info */
+	VIRTIO_VSOCK_OP_CREDIT_UPDATE = 7,
+	/* Request the peer to send the credit info to us */
+	VIRTIO_VSOCK_OP_CREDIT_REQUEST = 8,
+};
+\end{lstlisting}
+
+\subsubsection{Addressing}\label{sec:Device Types / VSock Device / Device Operation / Addressing}
+
+VSock flows are identified by a (source, destination) tuple. Address
+information consists of a (cid, port number) tuple. The header fields used for
+this are \field{src_cid}, \field{src_port}, \field{dst_cid}, and
+\field{dst_port}.
+
+Both stream and datagram sockets are supported. \field{type} is 1 for stream
+and 2 for datagram socket types. Stream and datagram port namespaces are
+independent.
+
+\subsubsection{Buffer Space Management}\label{sec:Device Types / VSock Device / Device Operation / Buffer Space Management}
+\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management.
+They allow the guest and the device to publish how much buffer space is
+available. This facilitates flow control so packets are never dropped and
+buffer space can potentially be reserved for high-priority flows.
+
+\field{buf_alloc} is the sender's total receive buffer space, in bytes. This
+includes both free and in-use buffers. \field{fwd_cnt} is the sender's
+free-running bytes received counter. The receiver uses this information to
+calculate the total amount of free buffer space:
+
+\begin{lstlisting}
+/* tx_cnt is a free-running bytes transmitted counter */
+u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
+\end{lstlisting}
+
+Both the driver and the device MUST track buffer space. If there is
+insufficient buffer space, they must wait until virtqueue buffers are returned
+and check \field{buf_alloc} and \field{fwd_cnt} again. The
+VIRTIO_VSOCK_OP_CREDIT_UPDATE packet MAY be sent to force buffer space
+management information exchange. VIRTIO_VSOCK_OP_CREDIT_REQUEST MUST be sent in
+response.
+
+\subsubsection{Receive and Transmit}\label{sec:Device Types / VSock Device / Device Operation / Receive and Transmit}
+The driver queues outgoing packets on the tx virtqueue and incoming packet
+receive buffers on the rx virtqueue. Packets are of the following form:
+
+\begin{lstlisting}
+struct virtio_vsock_packet {
+    struct virtio_vsock_hdr hdr;
+    u8 data[];
+};
+\end{lstlisting}
+
+Virtqueue buffers for outgoing packets are read-only. Virtqueue buffers for
+incoming packets are write-only.
+
+\subsubsection{Datagram Sockets}\label{sec:Device Types / VSock Device / Device Operation / Datagram Sockets}
+
+Datagram sockets are unordered, unreliable, connectionless flows with message
+boundaries. The maximum message size is 65535 bytes.
+
+Since rx virtqueue buffers may be smaller than the maximum message size, larger
+datagrams are split across multiple packets.  The header \field{len} field's
+upper 16 bits contain the total message size. The lower 16 bits contain the
+packet's payload size, which MUST be less or equal to the total message size.
+The header \field{flags} field's upper 16 bits contain a unique message
+identifier used to correlate packets belonging to the same message. The lower
+16 bits contain the packet's start offset within the message.
+
+The following \field{op} values are used: VIRTIO_VSOCK_OP_RW,
+VIRTIO_VSOCK_OP_CREDIT_UPDATE, and VIRTIO_VSOCK_OP_CREDIT_REQUEST. All other
+operations are ignored.
+
+\subsubsection{Stream Sockets}\label{sec:Device Types / VSock Device / Device Operation / Stream Sockets}
+
+Stream sockets are ordered, reliable, connection-oriented flows with no message
+boundaries.
+
+Connections are established by sending a VIRTIO_VSOCK_OP_REQUEST packet. If a
+listening socket exists on the destination a VIRTIO_VSOCK_OP_RESPONSE reply is
+sent with a cookie in the header \field{flags} field.
+
+A VIRTIO_VSOCK_OP_ACK packet is sent when the VIRTIO_VSOCK_OP_RESPONSE packet
+is received with the same cookie value in the header \field{flags} field.
+
+When a connected socket receives VIRTIO_VSOCK_OP_SHUTDOWN the header
+\field{flags} field bit 0 indicates that the peer will not receive any more
+data and bit 1 indicates that the peer will not send any more data. If these
+bits are set and there are no more virtqueue buffers pending the socket is
+disconnected.
+
+VIRTIO_VSOCK_OP_RST can be sent at any time to abort the connection process or
+forcibly disconnect.
+
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 Currently there are three device-independent feature bits defined:
-- 
2.1.0



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]