[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH v6] vsock: add vsock device
On 21 April 2016 at 15:36, Stefan Hajnoczi <stefanha@redhat.com> wrote: > The virtio vsock device is a zero-configuration socket communications > device. It is designed as a guest<->host management channel suitable > for communicating with guest agents. > > vsock is designed with the sockets API in mind and the driver is > typically implemented as an address family (at the same level as > AF_INET). Applications written for the sockets API can be ported with > minimal changes (similar amount of effort as adding IPv6 support to an > IPv4 application). > > Unlike the existing console device, which is also used for guest<->host > communication, multiple clients can connect to a server at the same time > over vsock. This limitation requires console-based users to arbitrate > access through a single client. In vsock they can connect directly and > do not have to synchronize with each other. > > Unlike network devices, no configuration is necessary because the device > comes with its address in the configuration space. > > The vsock device was prototyped by Gerd Hoffmann and Asias He. I picked > the code and design up from them. First: Do you happen to have this in a git tree, i.e. where I can easily run make and get a more readable document? (Bonus points if you are publishing the formatted docs somewhere already ;-)) Secondly and the main reason for mailing: I've tripped over an interesting issue recently. I'm using your v5 Linux frontend patches + my own backend. One property of my backend is that when a TX request requires a response (e.g. a RW=>CREDIT_UPDATE, REQUEST=>RESPONSE, *=>RST) it waits synchronously for an RX slot in order to send the reply. So what happens is we have a sender in the guest, see first backtrace below, which is holding sk_lock-AF_VSOCK and is waiting in virtio_transport_send_one_pkt(), looping calling virtqueue_add_sgs(), i.e. waiting for a TX slot to become available. We also have the receive work queue in the guest, see second backtrace below, it is holding several locks (one for virtio_vsock_workqueue, one for sock->rx_work and sock->rx_lock) but it needs sk_lock-AF_VSOCK to do its thing and so it is spinning waiting for it. Now we have the backend, it has a tx thread which wants to send a reply, however there are no RX descriptors available right now and so it is spinning waiting for one. However while it does this it is not processing any more tx requests, so now the sender in the guest cannot make progress, so it holds sk_lock-AF_VSOCK forever preventing the RX workqueue from making any progress, and therefore preventing any new RX descriptors from being enqueued. So we have a threeway frontend:rx-frontend:tx-backend:tx deadlock. Since there is an external entity lockdep is unable to spot this (and in any case I doubt it considers the spin+schedule loop in virtio_transport_send_one_pkt() to be a "lock"), I had a look at Claudio's various fixes but none of them looked to cover this case since they were mainly about lockdep issues. I solved this by removing the synchronous reply from my backends tx thread and instead queuing replies onto a backend-internal ring to be processed by the rx thread, meaning that the tx thread can continue and process more entries, freeing them up for the guest's sender thread to use and breaking the deadlock. However the issue with this (and the reason I avoided it in favour of the synchronous scheme to start with) was that the number of replies which can be pending in this manner is effectively unbounded, since the guest can continue to produce as many TX requests as it likes and may not process the replies quickly enough. I started with a ring which was the same size as the virtio descriptor ring, but I managed to observe that filling up (which would then expose the same deadlock but with an extra chain). I ended up doubling the size and was no longer able to provoke that, but that seems unsatisfactory and I assume I am just lucky that the workloads I've got aren't hitting it. I'm unsure what the right answer here is, lots of the obvious answers (i.e. throwing stuff on the floor, eliding some CREDIT_UPDATE responses) would work against the guaranteed delivery goal of virtio-vsock. Clearly the deadlock is bad, but equally I don't think we can mandate that the backend must be prepared to queue an arbitrary number of replies, and if we set some upper bound on that then we re-introduce the deadlock (or introduce an occasional "backend shoots the guest for using too many slots" failure mode). About the only think I can think of would be to mandate that the TX and RX processing in the frontend must not block (or spin) while holding a common resource (i.e. disallow either from holding sk_lock-AF_VSOCK while waiting for things). I'm not sure how practical that is in terms of actual implementation. It looked like the TX side had taken it in common code, but I may have misread. A separate "replies" ring from the "(data) rx" ring might work. Or perhaps a response-descriptor could be added to the tx chains. It would only really need the op field and buff_alloc+fwd_cnt, not the full set. I'm not sure if either of those are idiomatic virtio -- but my gut is telling me no. I'm hoping that I've just done something stupid or am missing some blindingly obvious solution. Ian. Sender's trace: [ 121.218465] INFO: task transfused:1067 blocked for more than 30 seconds. [ 121.218908] Not tainted 4.4.6+ #29 [ 121.219122] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 121.219621] transfused D ffff880072eefb88 0 1067 1 0x00000000 [ 121.220114] ffff880072eefb88 00ff880076206928 ffff88007d2585c0 ffff880072f2c6c0 [ 121.220640] ffff880072ef0000 ffff880076206928 ffff880076206800 ffff880076247c00 [ 121.221176] 0000000000000002 ffff880072eefba0 ffffffff815c0f9b ffff88006bd67f40 [ 121.221685] Call Trace: [ 121.221799] [<ffffffff815c0f9b>] schedule+0x73/0x81 [ 121.222078] [<ffffffff815b56ad>] virtio_transport_send_one_pkt+0xf4/0x138 [ 121.222514] [<ffffffff810a1829>] ? wake_up_atomic_t+0x25/0x25 [ 121.222865] [<ffffffff815b58d1>] virtio_transport_send_pkt+0x1b7/0x1c6 [ 121.223254] [<ffffffff810a1829>] ? wake_up_atomic_t+0x25/0x25 [ 121.223616] [<ffffffff815b66fd>] virtio_transport_send_pkt+0x23/0x25 [ 121.224065] [<ffffffff815b67a0>] virtio_transport_stream_enqueue+0x37/0x3b [ 121.224521] [<ffffffff815b4465>] vsock_stream_sendmsg+0x297/0x2f2 [ 121.224907] [<ffffffff810a1829>] ? wake_up_atomic_t+0x25/0x25 [ 121.225268] [<ffffffff813f6d24>] sock_sendmsg+0x2e/0x3f [ 121.225589] [<ffffffff813f6daa>] sock_write_iter+0x75/0x8d [ 121.225931] [<ffffffff8115e44f>] new_sync_write+0x5b/0x7d [ 121.226266] [<ffffffff8115e499>] __vfs_write+0x28/0x2a [ 121.226579] [<ffffffff8115e896>] vfs_write+0x95/0xc0 [ 121.226880] [<ffffffff8115f001>] SyS_write+0x4b/0x79 [ 121.227180] [<ffffffff815c4c72>] entry_SYSCALL_64_fastpath+0x12/0x72 [ 121.227586] 1 lock held by transfused/1067: [ 121.227819] #0: (sk_lock-AF_VSOCK){+.+.+.}, at: [<ffffffff815b3307>] lock_sock+0xb/0xd RX queue's trace: [ 121.204369] INFO: task kworker/0:1:22 blocked for more than 30 seconds. [ 121.204926] Not tainted 4.4.6+ #29 [ 121.205198] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 121.205877] kworker/0:1 D ffff88007d25fc50 0 22 2 0x00000000 [ 121.206508] Workqueue: virtio_vsock virtio_transport_recv_pkt_work [ 121.207021] ffff88007d25fc50 00000000fffffe01 ffffffff818b1540 ffff88007d2585c0 [ 121.207698] ffff88007d260000 ffff880072fd0c08 ffff880072fd0bc8 0000000000855abf [ 121.208382] ffff88001a0f1598 ffff88007d25fc68 ffffffff815c0f9b ffff880072fd0b40 [ 121.209083] Call Trace: [ 121.209213] [<ffffffff815c0f9b>] schedule+0x73/0x81 [ 121.209602] [<ffffffff813f90a2>] __lock_sock+0x6e/0x96 [ 121.210003] [<ffffffff810a1829>] ? wake_up_atomic_t+0x25/0x25 [ 121.210478] [<ffffffff813fa7b2>] lock_sock_nested+0x34/0x82 [ 121.210936] [<ffffffff815b6f3d>] virtio_transport_recv_pkt+0x171/0x471 [ 121.211485] [<ffffffff815c1dcb>] ? mutex_lock_nested+0x2f8/0x361 [ 121.211983] [<ffffffff81320f41>] ? virtqueue_get_buf+0xd7/0x110 [ 121.212476] [<ffffffff815b5d1c>] virtio_transport_recv_pkt_work+0xca/0x144 [ 121.212994] [<ffffffff81085f28>] process_one_work+0x207/0x397 [ 121.213364] [<ffffffff8108654e>] worker_thread+0x1e7/0x292 [ 121.213706] [<ffffffff81086367>] ? rescuer_thread+0x285/0x285 [ 121.214058] [<ffffffff81086367>] ? rescuer_thread+0x285/0x285 [ 121.214509] [<ffffffff8108ab0b>] kthread+0xb5/0xbd [ 121.214796] [<ffffffff810a7882>] ? lock_acquire+0xd9/0x134 [ 121.215154] [<ffffffff8108aa56>] ? kthread_parkme+0x1f/0x1f [ 121.215502] [<ffffffff815c4fdf>] ret_from_fork+0x3f/0x70 [ 121.215828] [<ffffffff8108aa56>] ? kthread_parkme+0x1f/0x1f [ 121.216160] 3 locks held by kworker/0:1/22: [ 121.216408] #0: ("virtio_vsock"){.+.+.+}, at: [<ffffffff81085e64>] process_one_work+0x143/0x397 [ 121.217058] #1: ((&vsock->rx_work)){+.+.+.}, at: [<ffffffff81085e64>] process_one_work+0x143/0x397 [ 121.217727] #2: (&vsock->rx_lock){+.+.+.}, at: [<ffffffff815b5cae>] virtio_transport_recv_pkt_work+0x5c/0x144
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]