OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [RFC] vsock: add vsock device


On Mon, Jun 01, 2015 at 05:39:15PM +0200, Michael S. Tsirkin wrote:
> On Thu, May 21, 2015 at 05:40:41PM +0100, Stefan Hajnoczi wrote:
> This looks good overall.
> For inclusion in spec, this needs to be a bit more strict,
> e.g. we need conformance statements that say that
> stream buffers MUST NOT be discarded, datagram ones MAY be discarded.

Okay, I'll review the draft and specify things in terms of conformance
statements.

> > +\begin{lstlisting}
> > +struct virtio_vsock_hdr {
> > +	__le32 src_cid;
> > +	__le32 src_port;
> > +	__le32 dst_cid;
> > +	__le32 dst_port;
> 
> 32 bit seems a bit restrictive.
> OTOH are 32 bit ports supported?

VM Sockets ports are 32-bit.  This is part of the core net/vmw_vsock/
implementation already used by VMware's VMCI interface.  Unless there is
a strong requirement to change the port namespace we should keep it
as-is.

> > +\subsubsection{Buffer Space Management}\label{sec:Device Types / VSock Device / Device Operation / Buffer Space Management}
> > +\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management.
> > +They allow the guest and the device to publish how much buffer space is
> > +available. This facilitates flow control so packets are never dropped and
> > +buffer space can potentially be reserved for high-priority flows.
> > +
> > +\field{buf_alloc} is the sender's total receive buffer space, in bytes. This
> > +includes both free and in-use buffers. \field{fwd_cnt} is the sender's
> > +free-running bytes received counter. The receiver uses this information to
> > +calculate the total amount of free buffer space:
> > +
> > +\begin{lstlisting}
> > +/* tx_cnt is a free-running bytes transmitted counter */
> > +u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt);
> > +\end{lstlisting}
> > +
> > +Both the driver and the device MUST track buffer space. If there is
> > +insufficient buffer space, they must wait until virtqueue buffers are returned
> > +and check \field{buf_alloc} and \field{fwd_cnt} again. The
> > +VIRTIO_VSOCK_OP_CREDIT_UPDATE packet MAY be sent to force buffer space
> > +management information exchange. VIRTIO_VSOCK_OP_CREDIT_REQUEST MUST be sent in
> > +response.

Typo: CREDIT_UPDATE <-> CREDIT_REQUEST are swapped.

> Is it ok to initiate connection with 0 buf space allocated?
> If so, how does remote know it should add bufs?

Yes, a new SOCK_STREAM connection can be started with 0 buffer space
available.  Only data bytes count towards buffer space, not packet
headers used for control packets.

If the peer has 0 buffer space available then a CREDIT_REQUEST query is
sent, just to make sure that buffer space is still empty.  Also, when
data is dequeued (i.e. recvmsg() from userspace) a gratuitous
CREDIT_UPDATE is sent so the peer knows it may now be able to transmit.
This wakes the connection back up so packets can be transmitted.


After reviewing this mechanism more closely, I noticed a key point I
failed to document: fwd_cnt is only updated on recvmsg() from userspace.
The idea is that peer_buf_alloc is the total amount of socket buffer
space available.

There is a denial-of-service case: if there are two flows (dgram or
stream doesn't matter) and an application never calls recvmsg() for one
of the flows, then all buffer space can be tied up.  This prevents the
other flow from progressing.

Perhaps we should drop the buffer space mechanism for dgram flows and
make the counters per-flow for streams.  That way the denial-of-service
issue is avoided.  Does that sound good?

> > +VIRTIO_VSOCK_OP_RST can be sent at any time to abort the connection process or
> > +forcibly disconnect.
> 
> I'm guessing this is how one handles things like invalid connection
> attempts?  How about an error code to optionally tell remote what was
> wrong? Can one drop invalid connection attempts, or when one is out
> of resources? Will remote retry?

I'll look into this.

Stefan

Attachment: pgpMSomMdE2lu.pgp
Description: PGP signature



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]