virtio-dev message

Subject: Re: [PATCH v7] vsock: add vsock device
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Ian Campbell <ian.campbell@docker.com>
Date: Wed, 10 Aug 2016 16:24:42 +0100
On Mon, Aug 08, 2016 at 12:58:24PM +0100, Ian Campbell wrote:
> On 4 August 2016 at 16:24, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > The virtio vsock device is a zero-configuration socket communications
> > device.  It is designed as a guest<->host management channel suitable
> > for communicating with guest agents.
> >
> > vsock is designed with the sockets API in mind and the driver is
> > typically implemented as an address family (at the same level as
> > AF_INET).  Applications written for the sockets API can be ported with
> > minimal changes (similar amount of effort as adding IPv6 support to an
> > IPv4 application).
> >
> > Unlike the existing console device, which is also used for guest<->host
> > communication, multiple clients can connect to a server at the same time
> > over vsock.  This limitation requires console-based users to arbitrate
> > access through a single client.  In vsock they can connect directly and
> > do not have to synchronize with each other.
> >
> > Unlike network devices, no configuration is necessary because the device
> > comes with its address in the configuration space.
> >
> > The vsock device was prototyped by Gerd Hoffmann and Asias He.  I picked
> > the code and design up from them.
> >
> > VIRTIO-151
> >
> > Cc: Gerd Hoffmann <kraxel@redhat.com>
> > Cc: Asias He <asias.hejun@gmail.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > v7:
> >  * Add virtqueue flow control section to explain how deadlock is avoided
> >    when rings are full [Ian]
> 
> Thanks, would it be possible/easy for you to update
> https://stefanha.github.io/virtio/#x1-2800007 and
> https://github.com/stefanha/virtio? (I think they both still have v6)

I have updated it:
https://stefanha.github.io/virtio/#x1-2830007

> Is there a more-or-less semi-official git mirror of the master svn
> repo, or does everyone just roll their own with `git-svn`?

Everyone rolls their own with git-svn.

> I think the following quote is the entirety of the change/newly added text:
> 
> > +\subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> > +
> > +The tx virtqueue carries packets initiated by applications and replies to
> > +received packets.  The rx virtqueue carries packets initiated by the device and
> > +replies to previously transmitted packets.
> > +
> > +If both rx and tx virtqueues are filled by the driver and device at the same
> > +time then it appears that a deadlock is reached.  The driver has no free tx
> > +descriptors to send replies.  The device has no free rx descriptors to send
> > +replies either.  Therefore neither device nor driver can process virtqueues
> > +since that may involve sending new replies.
> > +
> > +This is solved using additional resources outside the virtqueue to hold
> > +packets.  With additional resources, it becomes possible to process incoming
> > +packets even when outgoing packets cannot be sent.
> > +
> > +Eventually even the additional resources will be exhausted and further
> > +processing is not possible until the other side processes the virtqueue that
> > +it has neglected.  This stop to processing prevents one side from causing
> > +unbounded resource consumption in the other side.
> 
> Rather than "This stop to..." did you mean "This is to stop..." or
> something else? I can't parse it as is I'm afraid.

Wow, I thought I'd crafted some pretty fine prose.  Sorry about that.
I'll change it to:

This stops one side from causing unbounded resource consumption.

> > +\drivernormative{\paragraph}{Device Operation: Virtqueue Flow Control}{Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> > +
> > +The rx virtqueue MUST be processed even when the tx virtqueue is full so long as there are additional resources available to hold packets outside the tx virtqueue.
> > +
> > +\devicenormative{\paragraph}{Device Operation: Virtqueue Flow Control}{Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
> > +
> > +The tx virtqueue MUST be processed even when the rx virtqueue is full so long as there are additional resources available to hold packets outside the rx virtqueue.
> 
> These describe handling the case where additional resources are
> available (by saying "keep going if you have resources to do so",
> which IMHO doesn't really need stating) but they don't describe what
> the acceptable behaviour is when those resources are no longer
> available, so really this is just kicking the deadlock can a bit
> further down the road without addressing the actual problem (i.e. the
> "Eventually even the additional resources..." paragraph from above).

No, there is an important point in there which I didn't state clearly:

Additional resources are *required*, otherwise virtqueue deadlock is
possible under normal operation.

Regarding kicking the can down the road:

It will always be possible to ignore incoming packets to the point where
the peer will have to stop responding, but this isn't a problem.  Under
correct operation this state is never reached.

> IIRC from previous discussions your intention was to say that a
> driver/backend should stop processessing TX requests if the RX ring
> becomes full (or I suppose if the "additional resources" are
> exhausted), I guess with the implications that it should never stop
> processing RX requests (which I think is always possible since from
> the driver end there are no "additional resources" needed for RX
> processing). But the above essentially explicitly states the OK case
> (i.e. the non-deadlock case, resources available) and leaves what to
> do on resource exhaustion implicit (while that is the most important
> bit of information).
>
> It's unspecified how much "additional resource" an implementation is
> expected/required to provide. Should it be enough for one ring slot?
> Or N*slots-per-ring? Is 0 an acceptable number?

The way I've implemented the resource limit is to only allow queuing N
reply packets when the outgoing virtqueue is full, where N is the size
of the incoming virtqueue.

If this limit is reached then incoming virtqueue processing is paused
until there is space to transmit pending reply packets again.

> BTW our backend implementation is now available in
> https://github.com/docker/hyperkit/blob/af-vsock/src/pci_virtio_sock.c.
> It corresponds to ~v5 of the spec IIIRC (so it still has workarounds
> relating to the OP_SHUTDOWN issue which you resolved in v6 and it
> hasn't yet switched to 64 bit addressing). WRT the above issue the
> "additional resources" are a pending-reply ring
> (https://github.com/docker/hyperkit/blob/af-vsock/src/pci_virtio_sock.c#L281)
> which I sized to be 2x the ring size (since with 1x I still saw
> deadlocks) however people are currently reporting deadlocks (actually
> an assert failure in this implementation) with that, so I am
> considering making it 4x, but that is just pushing the problem a
> further away again without solving.

One detail about the limit I've implemented is that it only applies to
*reply* packets.  It does not count ordinary outgoing packets.  Perhaps
you are counting all outgoing packets.  I'm trusting rlimits, socket
buffer sizes, etc to limit ordinary outgoing packets so they don't
matter here.
Attachment: signature.asc
Description: PGP signature
Follow-Ups:
- Re: [PATCH v7] vsock: add vsock device
  - From: Ian Campbell <ian.campbell@docker.com>
References:
- [PATCH v7] vsock: add vsock device
  - From: Stefan Hajnoczi <stefanha@redhat.com>
- Re: [PATCH v7] vsock: add vsock device
  - From: Ian Campbell <ian.campbell@docker.com>