OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [RFC] vsock: add vsock device


On Thu, May 21, 2015 at 05:40:41PM +0100, Stefan Hajnoczi wrote:
> [Resent because my personal email address was bounced by the virtio-dev mailing
> list and Andy King's email address no longer works.]
> 
> The virtio vsock device is a zero-configuration datagram and stream
> socket communications device.  It is designed as a guest<->host
> management channel suitable for communicating with guest agents or host
> services.
> 
> vsock is designed with the sockets API in mind and the driver is
> typically implemented as an address family (at the same level as
> AF_INET).  Applications written for the sockets API can be ported with
> minimal changes (similar amount of effort as adding IPv6 support to an
> IPv4 application).
> 
> Unlike the existing console device, which is also used for guest<->host
> communication, multiple clients can connect to a server at the same time
> over vsock.  This limitation requires console-based users to arbitrate
> access through a single client.  In vsock they can connect directly and
> do not have to synchronize with each other.
> 
> Unlike network devices, no configuration is necessary because the device
> comes with its address in the configuration space.
> 
> The vsock device was prototyped by Gerd Hoffmann and Asias He.  I
> recently picked it up again.
> 
> Please take a look at this design.  I'd be happy to flesh it out and
> answer any questions you may have.
> 
> Open questions:
> 
>  * Denial of service scenarios?  Competing flows use the same rx/tx
>    virtqueue.  This design is similar to network cards so it should not
>    pose a big problem as long as the driver and guest copy data out of
>    the ring as soon as possible instead of waiting for applications
>    before reclaiming virtqueue buffers.
> 
>  * Can stream sockets be simplified?  They mostly ape TCP but I'm not
>    sure whether the connection establishment and shutdown needs to be as
>    elaborate.
> 
>  * Multiqueue?  (Probably not critical, can be added with a feature bit
>    later if needed.)
> 
> I will post patches for the Linux kernel in the coming week.  They
> consist of a virtio guest driver and vhost host driver that use the
> AF_VSOCK address family already available in net/vmw_vsock/.
> 
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Asias He <asias.hejun@gmail.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

This looks good overall.
For inclusion in spec, this needs to be a bit more strict,
e.g. we need conformance statements that say that
stream buffers MUST NOT be discarded, datagram ones MAY be discarded.


> ---
>  trunk/content.tex | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 172 insertions(+)
> 
> diff --git a/trunk/content.tex b/trunk/content.tex
> index 1efdcc8..f03ffff 100644
> --- a/trunk/content.tex
> +++ b/trunk/content.tex
> @@ -5146,6 +5146,178 @@ descriptor for the \field{sense_len}, \field{residual},
>  \field{status_qualifier}, \field{status}, \field{response} and
>  \field{sense} fields.
>  
> +\section{VSock Device}\label{sec:Device Types / VSock Device}
> +
> +The virtio vsock device is a zero-configuration datagram and stream socket
> +communications device. It facilitates data transfer between the guest and
> +device without using the Ethernet or IP protocols.
> +
> +\subsection{Device ID}\label{sec:Device Types / VSock Device / Device ID}
> +  13
> +
> +\subsection{Virtqueues}\label{sec:Device Types / VSock Device / Virtqueues}
> +\begin{description}
> +\item[0] ctrl
> +\item[1] rx
> +\item[2] tx
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / VSock Device / Feature bits}
> +
> +\begin{description}
> +There are currently no feature bits defined for this device.
> +\end{description}
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / VSock Device / Device configuration layout}
> +
> +TODO drop max_virtqueue_pairs for now?
> +
> +\begin{lstlisting}
> +struct virtio_vsock_config {
> +	__le32 guest_cid;
> +	__le32 max_virtqueue_pairs;
> +};
> +\end{lstlisting}
> +
> +
> +\subsection{Device Initialization}\label{sec:Device Types / VSock Device / Device Initialization}
> +
> +\begin{enumerate}
> +\item The guest's cid can be read from \field{guest_cid}.
> +
> +\item Buffers must be added to the rx virtqueue to start receiving packets.
> +\end{enumerate}
> +
> +\subsection{Device Operation}\label{sec:Device Types / VSock Device / Device Operation}
> +
> +Packets transmitted or received contain a header before the payload:
> +
> +\begin{lstlisting}
> +struct virtio_vsock_hdr {
> +	__le32 src_cid;
> +	__le32 src_port;
> +	__le32 dst_cid;
> +	__le32 dst_port;

32 bit seems a bit restrictive.
OTOH are 32 bit ports supported?

> +	__le32 len;
> +	__le16 type;
> +	__le16 op;
> +	__le32 flags;
> +	__le32 buf_alloc;
> +	__le32 fwd_cnt;
> +};
> +\end{lstlisting}
> +
> +Most packets simply transfer data but control packets are also used for
> +connection and buffer space management.  \field{op} is one of the following
> +operation constants:
> +
> +\begin{lstlisting}
> +enum {
> +	VIRTIO_VSOCK_OP_INVALID = 0,
> +
> +	/* Connect operations */
> +	VIRTIO_VSOCK_OP_REQUEST = 1,
> +	VIRTIO_VSOCK_OP_RESPONSE = 2,
> +	VIRTIO_VSOCK_OP_ACK = 3,
> +	VIRTIO_VSOCK_OP_RST = 4,
> +	VIRTIO_VSOCK_OP_SHUTDOWN = 5,
> +
> +	/* To send payload */
> +	VIRTIO_VSOCK_OP_RW = 6,
> +
> +	/* Tell the peer our credit info */
> +	VIRTIO_VSOCK_OP_CREDIT_UPDATE = 7,
> +	/* Request the peer to send the credit info to us */
> +	VIRTIO_VSOCK_OP_CREDIT_REQUEST = 8,
> +};
> +\end{lstlisting}
> +
> +\subsubsection{Addressing}\label{sec:Device Types / VSock Device / Device Operation / Addressing}
> +
> +VSock flows are identified by a (source, destination) tuple. Address
> +information consists of a (cid, port number) tuple. The header fields used for
> +this are \field{src_cid}, \field{src_port}, \field{dst_cid}, and
> +\field{dst_port}.
> +
> +Both stream and datagram sockets are supported. \field{type} is 1 for stream
> +and 2 for datagram socket types. Stream and datagram port namespaces are
> +independent.
> +
> +\subsubsection{Buffer Space Management}\label{sec:Device Types / VSock Device / Device Operation / Buffer Space Management}
> +\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management.
> +They allow the guest and the device to publish how much buffer space is
> +available. This facilitates flow control so packets are never dropped and
> +buffer space can potentially be reserved for high-priority flows.
> +
> +\field{buf_alloc} is the sender's total receive buffer space, in bytes. This
> +includes both free and in-use buffers. \field{fwd_cnt} is the sender's
> +free-running bytes received counter. The receiver uses this information to
> +calculate the total amount of free buffer space:
> +
> +\begin{lstlisting}
> +/* tx_cnt is a free-running bytes transmitted counter */
> +u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> +\end{lstlisting}
> +
> +Both the driver and the device MUST track buffer space. If there is
> +insufficient buffer space, they must wait until virtqueue buffers are returned
> +and check \field{buf_alloc} and \field{fwd_cnt} again. The
> +VIRTIO_VSOCK_OP_CREDIT_UPDATE packet MAY be sent to force buffer space
> +management information exchange. VIRTIO_VSOCK_OP_CREDIT_REQUEST MUST be sent in
> +response.

Is it ok to initiate connection with 0 buf space allocated?
If so, how does remote know it should add bufs?


> +
> +\subsubsection{Receive and Transmit}\label{sec:Device Types / VSock Device / Device Operation / Receive and Transmit}
> +The driver queues outgoing packets on the tx virtqueue and incoming packet
> +receive buffers on the rx virtqueue. Packets are of the following form:
> +
> +\begin{lstlisting}
> +struct virtio_vsock_packet {
> +    struct virtio_vsock_hdr hdr;
> +    u8 data[];
> +};
> +\end{lstlisting}
> +
> +Virtqueue buffers for outgoing packets are read-only. Virtqueue buffers for
> +incoming packets are write-only.
> +
> +\subsubsection{Datagram Sockets}\label{sec:Device Types / VSock Device / Device Operation / Datagram Sockets}
> +
> +Datagram sockets are unordered, unreliable, connectionless flows with message
> +boundaries. The maximum message size is 65535 bytes.
> +
> +Since rx virtqueue buffers may be smaller than the maximum message size, larger
> +datagrams are split across multiple packets.  The header \field{len} field's
> +upper 16 bits contain the total message size. The lower 16 bits contain the
> +packet's payload size, which MUST be less or equal to the total message size.
> +The header \field{flags} field's upper 16 bits contain a unique message
> +identifier used to correlate packets belonging to the same message. The lower
> +16 bits contain the packet's start offset within the message.
> +
> +The following \field{op} values are used: VIRTIO_VSOCK_OP_RW,
> +VIRTIO_VSOCK_OP_CREDIT_UPDATE, and VIRTIO_VSOCK_OP_CREDIT_REQUEST. All other
> +operations are ignored.
> +
> +\subsubsection{Stream Sockets}\label{sec:Device Types / VSock Device / Device Operation / Stream Sockets}
> +
> +Stream sockets are ordered, reliable, connection-oriented flows with no message
> +boundaries.
> +
> +Connections are established by sending a VIRTIO_VSOCK_OP_REQUEST packet. If a
> +listening socket exists on the destination a VIRTIO_VSOCK_OP_RESPONSE reply is
> +sent with a cookie in the header \field{flags} field.
> +
> +A VIRTIO_VSOCK_OP_ACK packet is sent when the VIRTIO_VSOCK_OP_RESPONSE packet
> +is received with the same cookie value in the header \field{flags} field.
> +
> +When a connected socket receives VIRTIO_VSOCK_OP_SHUTDOWN the header
> +\field{flags} field bit 0 indicates that the peer will not receive any more
> +data and bit 1 indicates that the peer will not send any more data. If these
> +bits are set and there are no more virtqueue buffers pending the socket is
> +disconnected.
> +
> +VIRTIO_VSOCK_OP_RST can be sent at any time to abort the connection process or
> +forcibly disconnect.

I'm guessing this is how one handles things like invalid connection
attempts?  How about an error code to optionally tell remote what was
wrong? Can one drop invalid connection attempts, or when one is out
of resources? Will remote retry?

> +
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
>  Currently there are three device-independent feature bits defined:
> -- 
> 2.1.0
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]