[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] [RFC] vsock: add vsock device
On Mon, Jun 01, 2015 at 05:39:15PM +0200, Michael S. Tsirkin wrote: > On Thu, May 21, 2015 at 05:40:41PM +0100, Stefan Hajnoczi wrote: > > [Resent because my personal email address was bounced by the virtio-dev mailing > > list and Andy King's email address no longer works.] > > > > The virtio vsock device is a zero-configuration datagram and stream > > socket communications device. It is designed as a guest<->host > > management channel suitable for communicating with guest agents or host > > services. > > > > vsock is designed with the sockets API in mind and the driver is > > typically implemented as an address family (at the same level as > > AF_INET). Applications written for the sockets API can be ported with > > minimal changes (similar amount of effort as adding IPv6 support to an > > IPv4 application). > > > > Unlike the existing console device, which is also used for guest<->host > > communication, multiple clients can connect to a server at the same time > > over vsock. This limitation requires console-based users to arbitrate > > access through a single client. In vsock they can connect directly and > > do not have to synchronize with each other. > > > > Unlike network devices, no configuration is necessary because the device > > comes with its address in the configuration space. > > > > The vsock device was prototyped by Gerd Hoffmann and Asias He. I > > recently picked it up again. > > > > Please take a look at this design. I'd be happy to flesh it out and > > answer any questions you may have. > > > > Open questions: > > > > * Denial of service scenarios? Competing flows use the same rx/tx > > virtqueue. This design is similar to network cards so it should not > > pose a big problem as long as the driver and guest copy data out of > > the ring as soon as possible instead of waiting for applications > > before reclaiming virtqueue buffers. > > > > * Can stream sockets be simplified? They mostly ape TCP but I'm not > > sure whether the connection establishment and shutdown needs to be as > > elaborate. > > > > * Multiqueue? (Probably not critical, can be added with a feature bit > > later if needed.) > > > > I will post patches for the Linux kernel in the coming week. They > > consist of a virtio guest driver and vhost host driver that use the > > AF_VSOCK address family already available in net/vmw_vsock/. > > > > Cc: Gerd Hoffmann <kraxel@redhat.com> > > Cc: Asias He <asias.hejun@gmail.com> > > Cc: Michael S. Tsirkin <mst@redhat.com> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > This looks good overall. > For inclusion in spec, this needs to be a bit more strict, > e.g. we need conformance statements that say that > stream buffers MUST NOT be discarded, datagram ones MAY be discarded. > > > > --- > > trunk/content.tex | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 172 insertions(+) > > > > diff --git a/trunk/content.tex b/trunk/content.tex > > index 1efdcc8..f03ffff 100644 > > --- a/trunk/content.tex > > +++ b/trunk/content.tex > > @@ -5146,6 +5146,178 @@ descriptor for the \field{sense_len}, \field{residual}, > > \field{status_qualifier}, \field{status}, \field{response} and > > \field{sense} fields. > > > > +\section{VSock Device}\label{sec:Device Types / VSock Device} > > + > > +The virtio vsock device is a zero-configuration datagram and stream socket > > +communications device. It facilitates data transfer between the guest and > > +device without using the Ethernet or IP protocols. > > + > > +\subsection{Device ID}\label{sec:Device Types / VSock Device / Device ID} > > + 13 > > + > > +\subsection{Virtqueues}\label{sec:Device Types / VSock Device / Virtqueues} > > +\begin{description} > > +\item[0] ctrl > > +\item[1] rx > > +\item[2] tx > > +\end{description} > > + > > +\subsection{Feature bits}\label{sec:Device Types / VSock Device / Feature bits} > > + > > +\begin{description} > > +There are currently no feature bits defined for this device. > > +\end{description} > > + > > +\subsection{Device configuration layout}\label{sec:Device Types / VSock Device / Device configuration layout} > > + > > +TODO drop max_virtqueue_pairs for now? > > + > > +\begin{lstlisting} > > +struct virtio_vsock_config { > > + __le32 guest_cid; > > + __le32 max_virtqueue_pairs; > > +}; > > +\end{lstlisting} > > + > > + > > +\subsection{Device Initialization}\label{sec:Device Types / VSock Device / Device Initialization} > > + > > +\begin{enumerate} > > +\item The guest's cid can be read from \field{guest_cid}. > > + > > +\item Buffers must be added to the rx virtqueue to start receiving packets. > > +\end{enumerate} > > + > > +\subsection{Device Operation}\label{sec:Device Types / VSock Device / Device Operation} > > + > > +Packets transmitted or received contain a header before the payload: > > + > > +\begin{lstlisting} > > +struct virtio_vsock_hdr { > > + __le32 src_cid; > > + __le32 src_port; > > + __le32 dst_cid; > > + __le32 dst_port; > > 32 bit seems a bit restrictive. > OTOH are 32 bit ports supported? > > > + __le32 len; > > + __le16 type; > > + __le16 op; > > + __le32 flags; > > + __le32 buf_alloc; > > + __le32 fwd_cnt; > > +}; > > +\end{lstlisting} > > + > > +Most packets simply transfer data but control packets are also used for > > +connection and buffer space management. \field{op} is one of the following > > +operation constants: > > + > > +\begin{lstlisting} > > +enum { > > + VIRTIO_VSOCK_OP_INVALID = 0, > > + > > + /* Connect operations */ > > + VIRTIO_VSOCK_OP_REQUEST = 1, > > + VIRTIO_VSOCK_OP_RESPONSE = 2, > > + VIRTIO_VSOCK_OP_ACK = 3, > > + VIRTIO_VSOCK_OP_RST = 4, > > + VIRTIO_VSOCK_OP_SHUTDOWN = 5, > > + > > + /* To send payload */ > > + VIRTIO_VSOCK_OP_RW = 6, > > + > > + /* Tell the peer our credit info */ > > + VIRTIO_VSOCK_OP_CREDIT_UPDATE = 7, > > + /* Request the peer to send the credit info to us */ > > + VIRTIO_VSOCK_OP_CREDIT_REQUEST = 8, > > +}; > > +\end{lstlisting} > > + > > +\subsubsection{Addressing}\label{sec:Device Types / VSock Device / Device Operation / Addressing} > > + > > +VSock flows are identified by a (source, destination) tuple. Address > > +information consists of a (cid, port number) tuple. The header fields used for > > +this are \field{src_cid}, \field{src_port}, \field{dst_cid}, and > > +\field{dst_port}. > > + > > +Both stream and datagram sockets are supported. \field{type} is 1 for stream > > +and 2 for datagram socket types. Stream and datagram port namespaces are > > +independent. > > + > > +\subsubsection{Buffer Space Management}\label{sec:Device Types / VSock Device / Device Operation / Buffer Space Management} > > +\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management. > > +They allow the guest and the device to publish how much buffer space is > > +available. This facilitates flow control so packets are never dropped and > > +buffer space can potentially be reserved for high-priority flows. > > + > > +\field{buf_alloc} is the sender's total receive buffer space, in bytes. This > > +includes both free and in-use buffers. \field{fwd_cnt} is the sender's > > +free-running bytes received counter. The receiver uses this information to > > +calculate the total amount of free buffer space: > > + > > +\begin{lstlisting} > > +/* tx_cnt is a free-running bytes transmitted counter */ > > +u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt); > > +\end{lstlisting} > > + > > +Both the driver and the device MUST track buffer space. If there is > > +insufficient buffer space, they must wait until virtqueue buffers are returned > > +and check \field{buf_alloc} and \field{fwd_cnt} again. The > > +VIRTIO_VSOCK_OP_CREDIT_UPDATE packet MAY be sent to force buffer space > > +management information exchange. VIRTIO_VSOCK_OP_CREDIT_REQUEST MUST be sent in > > +response. > > Is it ok to initiate connection with 0 buf space allocated? > If so, how does remote know it should add bufs? > > > > + > > +\subsubsection{Receive and Transmit}\label{sec:Device Types / VSock Device / Device Operation / Receive and Transmit} > > +The driver queues outgoing packets on the tx virtqueue and incoming packet > > +receive buffers on the rx virtqueue. Packets are of the following form: > > + > > +\begin{lstlisting} > > +struct virtio_vsock_packet { > > + struct virtio_vsock_hdr hdr; > > + u8 data[]; > > +}; > > +\end{lstlisting} > > + > > +Virtqueue buffers for outgoing packets are read-only. Virtqueue buffers for > > +incoming packets are write-only. > > + > > +\subsubsection{Datagram Sockets}\label{sec:Device Types / VSock Device / Device Operation / Datagram Sockets} > > + > > +Datagram sockets are unordered, unreliable, connectionless flows with message > > +boundaries. The maximum message size is 65535 bytes. > > + > > +Since rx virtqueue buffers may be smaller than the maximum message size, larger > > +datagrams are split across multiple packets. The header \field{len} field's > > +upper 16 bits contain the total message size. The lower 16 bits contain the > > +packet's payload size, which MUST be less or equal to the total message size. > > +The header \field{flags} field's upper 16 bits contain a unique message > > +identifier used to correlate packets belonging to the same message. The lower > > +16 bits contain the packet's start offset within the message. > > + > > +The following \field{op} values are used: VIRTIO_VSOCK_OP_RW, > > +VIRTIO_VSOCK_OP_CREDIT_UPDATE, and VIRTIO_VSOCK_OP_CREDIT_REQUEST. All other > > +operations are ignored. > > + > > +\subsubsection{Stream Sockets}\label{sec:Device Types / VSock Device / Device Operation / Stream Sockets} > > + > > +Stream sockets are ordered, reliable, connection-oriented flows with no message > > +boundaries. > > + > > +Connections are established by sending a VIRTIO_VSOCK_OP_REQUEST packet. If a > > +listening socket exists on the destination a VIRTIO_VSOCK_OP_RESPONSE reply is > > +sent with a cookie in the header \field{flags} field. If the intent here is to defeat tcp syn flood type DOS attacks, I'm not sure a 32 bit cookie value is sufficient. > > + > > +A VIRTIO_VSOCK_OP_ACK packet is sent when the VIRTIO_VSOCK_OP_RESPONSE packet > > +is received with the same cookie value in the header \field{flags} field. > > + > > +When a connected socket receives VIRTIO_VSOCK_OP_SHUTDOWN the header > > +\field{flags} field bit 0 indicates that the peer will not receive any more > > +data and bit 1 indicates that the peer will not send any more data. If these > > +bits are set and there are no more virtqueue buffers pending the socket is > > +disconnected. > > + > > +VIRTIO_VSOCK_OP_RST can be sent at any time to abort the connection process or > > +forcibly disconnect. > > I'm guessing this is how one handles things like invalid connection > attempts? How about an error code to optionally tell remote what was > wrong? Can one drop invalid connection attempts, or when one is out > of resources? Will remote retry? > > > + > > \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > > > > Currently there are three device-independent feature bits defined: > > -- > > 2.1.0 > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]