OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [RFC] vsock: add vsock device


On Mon, Jun 01, 2015 at 05:39:15PM +0200, Michael S. Tsirkin wrote:
> On Thu, May 21, 2015 at 05:40:41PM +0100, Stefan Hajnoczi wrote:
> > [Resent because my personal email address was bounced by the virtio-dev mailing
> > list and Andy King's email address no longer works.]
> > 
> > The virtio vsock device is a zero-configuration datagram and stream
> > socket communications device.  It is designed as a guest<->host
> > management channel suitable for communicating with guest agents or host
> > services.
> > 
> > vsock is designed with the sockets API in mind and the driver is
> > typically implemented as an address family (at the same level as
> > AF_INET).  Applications written for the sockets API can be ported with
> > minimal changes (similar amount of effort as adding IPv6 support to an
> > IPv4 application).
> > 
> > Unlike the existing console device, which is also used for guest<->host
> > communication, multiple clients can connect to a server at the same time
> > over vsock.  This limitation requires console-based users to arbitrate
> > access through a single client.  In vsock they can connect directly and
> > do not have to synchronize with each other.
> > 
> > Unlike network devices, no configuration is necessary because the device
> > comes with its address in the configuration space.
> > 
> > The vsock device was prototyped by Gerd Hoffmann and Asias He.  I
> > recently picked it up again.
> > 
> > Please take a look at this design.  I'd be happy to flesh it out and
> > answer any questions you may have.
> > 
> > Open questions:
> > 
> >  * Denial of service scenarios?  Competing flows use the same rx/tx
> >    virtqueue.  This design is similar to network cards so it should not
> >    pose a big problem as long as the driver and guest copy data out of
> >    the ring as soon as possible instead of waiting for applications
> >    before reclaiming virtqueue buffers.
> > 
> >  * Can stream sockets be simplified?  They mostly ape TCP but I'm not
> >    sure whether the connection establishment and shutdown needs to be as
> >    elaborate.
> > 
> >  * Multiqueue?  (Probably not critical, can be added with a feature bit
> >    later if needed.)
> > 
> > I will post patches for the Linux kernel in the coming week.  They
> > consist of a virtio guest driver and vhost host driver that use the
> > AF_VSOCK address family already available in net/vmw_vsock/.
> > 
> > Cc: Gerd Hoffmann <kraxel@redhat.com>
> > Cc: Asias He <asias.hejun@gmail.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> This looks good overall.
> For inclusion in spec, this needs to be a bit more strict,
> e.g. we need conformance statements that say that
> stream buffers MUST NOT be discarded, datagram ones MAY be discarded.
> 
> 
> > ---
> >  trunk/content.tex | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 172 insertions(+)
> > 
> > diff --git a/trunk/content.tex b/trunk/content.tex
> > index 1efdcc8..f03ffff 100644
> > --- a/trunk/content.tex
> > +++ b/trunk/content.tex
> > @@ -5146,6 +5146,178 @@ descriptor for the \field{sense_len}, \field{residual},
> >  \field{status_qualifier}, \field{status}, \field{response} and
> >  \field{sense} fields.
> >  
> > +\section{VSock Device}\label{sec:Device Types / VSock Device}
> > +
> > +The virtio vsock device is a zero-configuration datagram and stream socket
> > +communications device. It facilitates data transfer between the guest and
> > +device without using the Ethernet or IP protocols.
> > +
> > +\subsection{Device ID}\label{sec:Device Types / VSock Device / Device ID}
> > +  13
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / VSock Device / Virtqueues}
> > +\begin{description}
> > +\item[0] ctrl
> > +\item[1] rx
> > +\item[2] tx
> > +\end{description}
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / VSock Device / Feature bits}
> > +
> > +\begin{description}
> > +There are currently no feature bits defined for this device.
> > +\end{description}
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / VSock Device / Device configuration layout}
> > +
> > +TODO drop max_virtqueue_pairs for now?
> > +
> > +\begin{lstlisting}
> > +struct virtio_vsock_config {
> > +	__le32 guest_cid;
> > +	__le32 max_virtqueue_pairs;
> > +};
> > +\end{lstlisting}
> > +
> > +
> > +\subsection{Device Initialization}\label{sec:Device Types / VSock Device / Device Initialization}
> > +
> > +\begin{enumerate}
> > +\item The guest's cid can be read from \field{guest_cid}.
> > +
> > +\item Buffers must be added to the rx virtqueue to start receiving packets.
> > +\end{enumerate}
> > +
> > +\subsection{Device Operation}\label{sec:Device Types / VSock Device / Device Operation}
> > +
> > +Packets transmitted or received contain a header before the payload:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vsock_hdr {
> > +	__le32 src_cid;
> > +	__le32 src_port;
> > +	__le32 dst_cid;
> > +	__le32 dst_port;
> 
> 32 bit seems a bit restrictive.
> OTOH are 32 bit ports supported?
> 
> > +	__le32 len;
> > +	__le16 type;
> > +	__le16 op;
> > +	__le32 flags;
> > +	__le32 buf_alloc;
> > +	__le32 fwd_cnt;
> > +};
> > +\end{lstlisting}
> > +
> > +Most packets simply transfer data but control packets are also used for
> > +connection and buffer space management.  \field{op} is one of the following
> > +operation constants:
> > +
> > +\begin{lstlisting}
> > +enum {
> > +	VIRTIO_VSOCK_OP_INVALID = 0,
> > +
> > +	/* Connect operations */
> > +	VIRTIO_VSOCK_OP_REQUEST = 1,
> > +	VIRTIO_VSOCK_OP_RESPONSE = 2,
> > +	VIRTIO_VSOCK_OP_ACK = 3,
> > +	VIRTIO_VSOCK_OP_RST = 4,
> > +	VIRTIO_VSOCK_OP_SHUTDOWN = 5,
> > +
> > +	/* To send payload */
> > +	VIRTIO_VSOCK_OP_RW = 6,
> > +
> > +	/* Tell the peer our credit info */
> > +	VIRTIO_VSOCK_OP_CREDIT_UPDATE = 7,
> > +	/* Request the peer to send the credit info to us */
> > +	VIRTIO_VSOCK_OP_CREDIT_REQUEST = 8,
> > +};
> > +\end{lstlisting}
> > +
> > +\subsubsection{Addressing}\label{sec:Device Types / VSock Device / Device Operation / Addressing}
> > +
> > +VSock flows are identified by a (source, destination) tuple. Address
> > +information consists of a (cid, port number) tuple. The header fields used for
> > +this are \field{src_cid}, \field{src_port}, \field{dst_cid}, and
> > +\field{dst_port}.
> > +
> > +Both stream and datagram sockets are supported. \field{type} is 1 for stream
> > +and 2 for datagram socket types. Stream and datagram port namespaces are
> > +independent.
> > +
> > +\subsubsection{Buffer Space Management}\label{sec:Device Types / VSock Device / Device Operation / Buffer Space Management}
> > +\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management.
> > +They allow the guest and the device to publish how much buffer space is
> > +available. This facilitates flow control so packets are never dropped and
> > +buffer space can potentially be reserved for high-priority flows.
> > +
> > +\field{buf_alloc} is the sender's total receive buffer space, in bytes. This
> > +includes both free and in-use buffers. \field{fwd_cnt} is the sender's
> > +free-running bytes received counter. The receiver uses this information to
> > +calculate the total amount of free buffer space:
> > +
> > +\begin{lstlisting}
> > +/* tx_cnt is a free-running bytes transmitted counter */
> > +u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> > +\end{lstlisting}
> > +
> > +Both the driver and the device MUST track buffer space. If there is
> > +insufficient buffer space, they must wait until virtqueue buffers are returned
> > +and check \field{buf_alloc} and \field{fwd_cnt} again. The
> > +VIRTIO_VSOCK_OP_CREDIT_UPDATE packet MAY be sent to force buffer space
> > +management information exchange. VIRTIO_VSOCK_OP_CREDIT_REQUEST MUST be sent in
> > +response.
> 
> Is it ok to initiate connection with 0 buf space allocated?
> If so, how does remote know it should add bufs?
> 
> 
> > +
> > +\subsubsection{Receive and Transmit}\label{sec:Device Types / VSock Device / Device Operation / Receive and Transmit}
> > +The driver queues outgoing packets on the tx virtqueue and incoming packet
> > +receive buffers on the rx virtqueue. Packets are of the following form:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vsock_packet {
> > +    struct virtio_vsock_hdr hdr;
> > +    u8 data[];
> > +};
> > +\end{lstlisting}
> > +
> > +Virtqueue buffers for outgoing packets are read-only. Virtqueue buffers for
> > +incoming packets are write-only.
> > +
> > +\subsubsection{Datagram Sockets}\label{sec:Device Types / VSock Device / Device Operation / Datagram Sockets}
> > +
> > +Datagram sockets are unordered, unreliable, connectionless flows with message
> > +boundaries. The maximum message size is 65535 bytes.
> > +
> > +Since rx virtqueue buffers may be smaller than the maximum message size, larger
> > +datagrams are split across multiple packets.  The header \field{len} field's
> > +upper 16 bits contain the total message size. The lower 16 bits contain the
> > +packet's payload size, which MUST be less or equal to the total message size.
> > +The header \field{flags} field's upper 16 bits contain a unique message
> > +identifier used to correlate packets belonging to the same message. The lower
> > +16 bits contain the packet's start offset within the message.
> > +
> > +The following \field{op} values are used: VIRTIO_VSOCK_OP_RW,
> > +VIRTIO_VSOCK_OP_CREDIT_UPDATE, and VIRTIO_VSOCK_OP_CREDIT_REQUEST. All other
> > +operations are ignored.
> > +
> > +\subsubsection{Stream Sockets}\label{sec:Device Types / VSock Device / Device Operation / Stream Sockets}
> > +
> > +Stream sockets are ordered, reliable, connection-oriented flows with no message
> > +boundaries.
> > +
> > +Connections are established by sending a VIRTIO_VSOCK_OP_REQUEST packet. If a
> > +listening socket exists on the destination a VIRTIO_VSOCK_OP_RESPONSE reply is
> > +sent with a cookie in the header \field{flags} field.

If the intent here is to defeat tcp syn flood type DOS attacks,
I'm not sure a 32 bit cookie value is sufficient.


> > +
> > +A VIRTIO_VSOCK_OP_ACK packet is sent when the VIRTIO_VSOCK_OP_RESPONSE packet
> > +is received with the same cookie value in the header \field{flags} field.
> > +
> > +When a connected socket receives VIRTIO_VSOCK_OP_SHUTDOWN the header
> > +\field{flags} field bit 0 indicates that the peer will not receive any more
> > +data and bit 1 indicates that the peer will not send any more data. If these
> > +bits are set and there are no more virtqueue buffers pending the socket is
> > +disconnected.
> > +
> > +VIRTIO_VSOCK_OP_RST can be sent at any time to abort the connection process or
> > +forcibly disconnect.
> 
> I'm guessing this is how one handles things like invalid connection
> attempts?  How about an error code to optionally tell remote what was
> wrong? Can one drop invalid connection attempts, or when one is out
> of resources? Will remote retry?
> 
> > +
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  
> >  Currently there are three device-independent feature bits defined:
> > -- 
> > 2.1.0
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]