OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

# virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [PATCH v7] vsock: add vsock device

• From: Ian Campbell <ian.campbell@docker.com>
• To: Stefan Hajnoczi <stefanha@redhat.com>
• Date: Thu, 11 Aug 2016 11:50:16 +0100

On 10 August 2016 at 16:24, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Mon, Aug 08, 2016 at 12:58:24PM +0100, Ian Campbell wrote:
>> On 4 August 2016 at 16:24, Stefan Hajnoczi <stefanha@redhat.com> wrote:

>> Thanks, would it be possible/easy for you to update
>> https://stefanha.github.io/virtio/#x1-2800007 and
>> https://github.com/stefanha/virtio? (I think they both still have v6)
>
> I have updated it:
> https://stefanha.github.io/virtio/#x1-2830007

Thanks.

>> Is there a more-or-less semi-official git mirror of the master svn
>> repo, or does everyone just roll their own with git-svn?
>
> Everyone rolls their own with git-svn.

OK, I shall do the same ;-)

>> I think the following quote is the entirety of the change/newly added text:
>>
>> > +\subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>> > +
>> > +The tx virtqueue carries packets initiated by applications and replies to
>> > +received packets.  The rx virtqueue carries packets initiated by the device and
>> > +replies to previously transmitted packets.
>> > +
>> > +If both rx and tx virtqueues are filled by the driver and device at the same
>> > +time then it appears that a deadlock is reached.  The driver has no free tx
>> > +descriptors to send replies.  The device has no free rx descriptors to send
>> > +replies either.  Therefore neither device nor driver can process virtqueues
>> > +since that may involve sending new replies.
>> > +
>> > +This is solved using additional resources outside the virtqueue to hold
>> > +packets.  With additional resources, it becomes possible to process incoming
>> > +packets even when outgoing packets cannot be sent.
>> > +
>> > +Eventually even the additional resources will be exhausted and further
>> > +processing is not possible until the other side processes the virtqueue that
>> > +it has neglected.  This stop to processing prevents one side from causing
>> > +unbounded resource consumption in the other side.
>>
>> Rather than "This stop to..." did you mean "This is to stop..." or
>> something else? I can't parse it as is I'm afraid.
>
> Wow, I thought I'd crafted some pretty fine prose.  Sorry about that.
> I'll change it to:
>
> This stops one side from causing unbounded resource consumption.

This is now parsable (thanks) but it's not clear to me how it stops
this from happening.

I think what I am missing is the bit which says explicitly "and under
these circumstances you should stop processing the ring entirely" (as
opposed to dumping traffic on the floor)

>> > +\drivernormative{\paragraph}{Device Operation: Virtqueue Flow Control}{Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>> > +
>> > +The rx virtqueue MUST be processed even when the tx virtqueue is full so long as there are additional resources available to hold packets outside the tx virtqueue.
>> > +
>> > +\devicenormative{\paragraph}{Device Operation: Virtqueue Flow Control}{Device Types / Socket Device / Device Operation / Virtqueue Flow Control}
>> > +
>> > +The tx virtqueue MUST be processed even when the rx virtqueue is full so long as there are additional resources available to hold packets outside the rx virtqueue.

would s/packets/replies/ in both of these clarify your intent some more?

In the case of the drivernormative one is it worth being explicit that
processing the rx virt queue includes replenishing the supply of empty
descriptors for the other end (not just consuming the filled ones)?

>> These describe handling the case where additional resources are
>> available (by saying "keep going if you have resources to do so",
>> which IMHO doesn't really need stating) but they don't describe what
>> the acceptable behaviour is when those resources are no longer
>> available, so really this is just kicking the deadlock can a bit
>> further down the road without addressing the actual problem (i.e. the
>> "Eventually even the additional resources..." paragraph from above).
>
> No, there is an important point in there which I didn't state clearly:
>
> possible under normal operation.
>
> Regarding kicking the can down the road:
>
> It will always be possible to ignore incoming packets to the point where
> the peer will have to stop responding, but this isn't a problem.  Under
> correct operation this state is never reached.

I'm afraid I'm still missing the piece of the puzzles which causes
this to be guaranteed to be true.

I think maybe there is something to do with the fact that a reply can

Is there also a requirement that the driver/device handle any
outstanding ARs before putting any actual data onto the outgoing ring?
I think there has to be, otherwise you are adding more future replies
which will eventually clog the ARs.

Such a requirement could be a bit subtle on the implementation side if
you have separate Rx and Tx processing threads (as I do), since one
thread may be injecting data just as the other is producing ARs. (i.e.
on the device side the RX thread fills the RX ring with data while the
TX thread is filling the AR queue with replies).

>> IIRC from previous discussions your intention was to say that a
>> driver/backend should stop processessing TX requests if the RX ring
>> becomes full (or I suppose if the "additional resources" are
>> exhausted), I guess with the implications that it should never stop
>> processing RX requests (which I think is always possible since from
>> the driver end there are no "additional resources" needed for RX
>> processing). But the above essentially explicitly states the OK case
>> (i.e. the non-deadlock case, resources available) and leaves what to
>> do on resource exhaustion implicit (while that is the most important
>> bit of information).
>>
>> It's unspecified how much "additional resource" an implementation is
>> expected/required to provide. Should it be enough for one ring slot?
>> Or N*slots-per-ring? Is 0 an acceptable number?
>
> The way I've implemented the resource limit is to only allow queuing N
> reply packets when the outgoing virtqueue is full, where N is the size
> of the incoming virtqueue.
>
> If this limit is reached then incoming virtqueue processing is paused
> until there is space to transmit pending reply packets again.

Is >=N a requirement, or just a convenient number? I have a feeling
this has a relationship to the "separate RX and Tx processing threads"
issue I mention above, specifically the ARs need to be larger than the
number of data slots which will be consumed before checking for ARs
(i.e. on the device side how often the RX thread stop processing real
data in order to check for pending ARs), which is another way of
saying that you need to stop and check for pending ARs more often than
every N data slots, what do you think?

I keep coming back to this "how many ARs are enough" question because
it seems like if N-1 ARs are not sufficient to avoid deadlock while N
ARs is then the "inductive step" from N-1 to N ought to be essentially
the "proof" which guarantees that deadlocks are not possible.

subtle to me or perhaps there are simply generic virtio
assumptions/idioms which I'm not aware of which allow these things to
be explained more easily.

>> BTW our backend implementation is now available in
>> https://github.com/docker/hyperkit/blob/af-vsock/src/pci_virtio_sock.c.
>> It corresponds to ~v5 of the spec IIIRC (so it still has workarounds
>> relating to the OP_SHUTDOWN issue which you resolved in v6 and it
>> hasn't yet switched to 64 bit addressing). WRT the above issue the
>> (https://github.com/docker/hyperkit/blob/af-vsock/src/pci_virtio_sock.c#L281)
>> which I sized to be 2x the ring size (since with 1x I still saw
>> an assert failure in this implementation) with that, so I am
>> considering making it 4x, but that is just pushing the problem a
>> further away again without solving.
>
> One detail about the limit I've implemented is that it only applies to
> *reply* packets.  It does not count ordinary outgoing packets.  Perhaps
> you are counting all outgoing packets.  I'm trusting rlimits, socket
> buffer sizes, etc to limit ordinary outgoing packets so they don't
> matter here.

I'm also only counting reply packets since ordinary outgoing packets
don't need to go over the pending reply ring (my manifestation of
ARs), they come directly out of an fd in the relevant thread.

I also communicate the need for a CREDIT_UPDATE reply as a single
per-socket bit, so those also do not consume ARs (and often can piggy
back on other traffic too).

Ian.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]