[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH v7] vsock: add vsock device
On Thu, Aug 11, 2016 at 11:50:16AM +0100, Ian Campbell wrote: > On 10 August 2016 at 16:24, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Mon, Aug 08, 2016 at 12:58:24PM +0100, Ian Campbell wrote: > >> On 4 August 2016 at 16:24, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > >> Thanks, would it be possible/easy for you to update > >> https://stefanha.github.io/virtio/#x1-2800007 and > >> https://github.com/stefanha/virtio? (I think they both still have v6) > > > > I have updated it: > > https://stefanha.github.io/virtio/#x1-2830007 > > Thanks. > > >> Is there a more-or-less semi-official git mirror of the master svn > >> repo, or does everyone just roll their own with `git-svn`? > > > > Everyone rolls their own with git-svn. > > OK, I shall do the same ;-) > > >> I think the following quote is the entirety of the change/newly added text: > >> > >> > +\subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control} > >> > + > >> > +The tx virtqueue carries packets initiated by applications and replies to > >> > +received packets. The rx virtqueue carries packets initiated by the device and > >> > +replies to previously transmitted packets. > >> > + > >> > +If both rx and tx virtqueues are filled by the driver and device at the same > >> > +time then it appears that a deadlock is reached. The driver has no free tx > >> > +descriptors to send replies. The device has no free rx descriptors to send > >> > +replies either. Therefore neither device nor driver can process virtqueues > >> > +since that may involve sending new replies. > >> > + > >> > +This is solved using additional resources outside the virtqueue to hold > >> > +packets. With additional resources, it becomes possible to process incoming > >> > +packets even when outgoing packets cannot be sent. > >> > + > >> > +Eventually even the additional resources will be exhausted and further > >> > +processing is not possible until the other side processes the virtqueue that > >> > +it has neglected. This stop to processing prevents one side from causing > >> > +unbounded resource consumption in the other side. > >> > >> Rather than "This stop to..." did you mean "This is to stop..." or > >> something else? I can't parse it as is I'm afraid. > > > > Wow, I thought I'd crafted some pretty fine prose. Sorry about that. > > I'll change it to: > > > > This stops one side from causing unbounded resource consumption. > > This is now parsable (thanks) but it's not clear to me how it stops > this from happening. > > I think what I am missing is the bit which says explicitly "and under > these circumstances you should stop processing the ring entirely" (as > opposed to dumping traffic on the floor) This is actually a general spec issue: we don't specify what should devices do when they encounter an invalid request, and that is not good: in particular, some devices just cause the guest to exit. I think we should work on a new feature for recovering from errors e.g. by device reset. > >> > +\drivernormative{\paragraph}{Device Operation: Virtqueue Flow Control}{Device Types / Socket Device / Device Operation / Virtqueue Flow Control} > >> > + > >> > +The rx virtqueue MUST be processed even when the tx virtqueue is full so long as there are additional resources available to hold packets outside the tx virtqueue. > >> > + > >> > +\devicenormative{\paragraph}{Device Operation: Virtqueue Flow Control}{Device Types / Socket Device / Device Operation / Virtqueue Flow Control} > >> > + > >> > +The tx virtqueue MUST be processed even when the rx virtqueue is full so long as there are additional resources available to hold packets outside the rx virtqueue. > > would s/packets/replies/ in both of these clarify your intent some more? > > In the case of the drivernormative one is it worth being explicit that > processing the rx virt queue includes replenishing the supply of empty > descriptors for the other end (not just consuming the filled ones)? > > >> These describe handling the case where additional resources are > >> available (by saying "keep going if you have resources to do so", > >> which IMHO doesn't really need stating) but they don't describe what > >> the acceptable behaviour is when those resources are no longer > >> available, so really this is just kicking the deadlock can a bit > >> further down the road without addressing the actual problem (i.e. the > >> "Eventually even the additional resources..." paragraph from above). > > > > No, there is an important point in there which I didn't state clearly: > > > > Additional resources are *required*, otherwise virtqueue deadlock is > > possible under normal operation. > > > > Regarding kicking the can down the road: > > > > It will always be possible to ignore incoming packets to the point where > > the peer will have to stop responding, but this isn't a problem. Under > > correct operation this state is never reached. > > I'm afraid I'm still missing the piece of the puzzles which causes > this to be guaranteed to be true. > > I think maybe there is something to do with the fact that a reply can > never itself require a reply? > > Is there also a requirement that the driver/device handle any > outstanding ARs before putting any actual data onto the outgoing ring? > I think there has to be, otherwise you are adding more future replies > which will eventually clog the ARs. > > Such a requirement could be a bit subtle on the implementation side if > you have separate Rx and Tx processing threads (as I do), since one > thread may be injecting data just as the other is producing ARs. (i.e. > on the device side the RX thread fills the RX ring with data while the > TX thread is filling the AR queue with replies). > > >> IIRC from previous discussions your intention was to say that a > >> driver/backend should stop processessing TX requests if the RX ring > >> becomes full (or I suppose if the "additional resources" are > >> exhausted), I guess with the implications that it should never stop > >> processing RX requests (which I think is always possible since from > >> the driver end there are no "additional resources" needed for RX > >> processing). But the above essentially explicitly states the OK case > >> (i.e. the non-deadlock case, resources available) and leaves what to > >> do on resource exhaustion implicit (while that is the most important > >> bit of information). > >> > >> It's unspecified how much "additional resource" an implementation is > >> expected/required to provide. Should it be enough for one ring slot? > >> Or N*slots-per-ring? Is 0 an acceptable number? > > > > The way I've implemented the resource limit is to only allow queuing N > > reply packets when the outgoing virtqueue is full, where N is the size > > of the incoming virtqueue. > > > > If this limit is reached then incoming virtqueue processing is paused > > until there is space to transmit pending reply packets again. > > Is >=N a requirement, or just a convenient number? I have a feeling > this has a relationship to the "separate RX and Tx processing threads" > issue I mention above, specifically the ARs need to be larger than the > number of data slots which will be consumed before checking for ARs > (i.e. on the device side how often the RX thread stop processing real > data in order to check for pending ARs), which is another way of > saying that you need to stop and check for pending ARs more often than > every N data slots, what do you think? > > I keep coming back to this "how many ARs are enough" question because > it seems like if N-1 ARs are not sufficient to avoid deadlock while N > ARs is then the "inductive step" from N-1 to N ought to be essentially > the "proof" which guarantees that deadlocks are not possible. > > Sorry to be so pedantic about this, the guarantee here seems very > subtle to me or perhaps there are simply generic virtio > assumptions/idioms which I'm not aware of which allow these things to > be explained more easily. > > >> BTW our backend implementation is now available in > >> https://github.com/docker/hyperkit/blob/af-vsock/src/pci_virtio_sock.c. > >> It corresponds to ~v5 of the spec IIIRC (so it still has workarounds > >> relating to the OP_SHUTDOWN issue which you resolved in v6 and it > >> hasn't yet switched to 64 bit addressing). WRT the above issue the > >> "additional resources" are a pending-reply ring > >> (https://github.com/docker/hyperkit/blob/af-vsock/src/pci_virtio_sock.c#L281) > >> which I sized to be 2x the ring size (since with 1x I still saw > >> deadlocks) however people are currently reporting deadlocks (actually > >> an assert failure in this implementation) with that, so I am > >> considering making it 4x, but that is just pushing the problem a > >> further away again without solving. > > > > One detail about the limit I've implemented is that it only applies to > > *reply* packets. It does not count ordinary outgoing packets. Perhaps > > you are counting all outgoing packets. I'm trusting rlimits, socket > > buffer sizes, etc to limit ordinary outgoing packets so they don't > > matter here. > > I'm also only counting reply packets since ordinary outgoing packets > don't need to go over the pending reply ring (my manifestation of > ARs), they come directly out of an fd in the relevant thread. > > I also communicate the need for a CREDIT_UPDATE reply as a single > per-socket bit, so those also do not consume ARs (and often can piggy > back on other traffic too). > > Ian.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]