OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [PATCH v4 0/6] kernel vhost-vsock: endianness and some locking issues solved


This patchset is based on the vsock branch from Stefan Hajnoczi, available
at  https://github.com/stefanha/linux .

This patchset solves a few problems, and maybe introduces some new ones.
Some other problems are intentionally not solved as they require the
intervention of the maintainer.

The following two are real bugs, addressed and solved for real:

Patches 1 and 2 are bugfixes in the new virtio/vhost-vsock code

Patches 3 - 6 try to fix some lock contention issues. in some cases just
by brutally silencing lockdep, so they sould be considered more as hints
than real bugfixes.

There were still complex lockdep warnings/errors, which I didn't address
due to lack of time. 
Here you can see an example of a potential deadlock that still remains:

[161354.171873] ======================================================
[161354.171874] [ INFO: possible circular locking dependency detected ]
[161354.171876] 4.5.0-rc1-309358-g1008150-dirty #1 Not tainted
[161354.171878] -------------------------------------------------------
[161354.171880] vhost-62642/62645 is trying to acquire lock:
[161354.171881]  (sk_lock-AF_VSOCK){+.+.+.}, at: [<000000000094a984>] virtio_transport_recv_pkt+0x76c/0x7c8
[161354.171892] 
                but task is already holding lock:
[161354.171894]  (&vq->mutex){+.+.+.}, at: [<000003ff80026944>] vhost_vsock_handle_tx_kick+0x5c/0x488 [vhost_vsock]
[161354.171899] 
                which lock already depends on the new lock.

[161354.171901] 
                the existing dependency chain (in reverse order) is:
[161354.171903] 
                -> #1 (&vq->mutex){+.+.+.}:
[161354.171907]        [<00000000001a7fde>] __lock_acquire+0x1476/0x1858
[161354.171934]        [<00000000001a8c72>] lock_acquire+0x212/0x280
[161354.171937]        [<0000000000952916>] mutex_lock_nested+0x7e/0x458
[161354.171940]        [<000003ff80027366>] vhost_transport_send_pkt+0xfe/0x268 [vhost_vsock]
[161354.171943]        [<0000000000949cf0>] virtio_transport_connect+0x50/0x60
[161354.171945]        [<0000000000947fce>] vsock_stream_connect+0x126/0x328
[161354.171950]        [<0000000000802ae2>] SyS_connect+0xaa/0xd8
[161354.171956]        [<00000000008039e4>] SyS_socketcall+0x104/0x388
[161354.171959]        [<000000000095838e>] system_call+0xd6/0x270
[161354.171962]        [<000003ff8d21342a>] 0x3ff8d21342a
[161354.171964] 
                -> #0 (sk_lock-AF_VSOCK){+.+.+.}:
[161354.171968]        [<00000000001a3350>] check_prev_add+0x160/0x788
[161354.171971]        [<00000000001a7fde>] __lock_acquire+0x1476/0x1858
[161354.171973]        [<00000000001a8c72>] lock_acquire+0x212/0x280
[161354.171975]        [<0000000000806fd4>] lock_sock_nested+0x9c/0xb8
[161354.171978]        [<000000000094a984>] virtio_transport_recv_pkt+0x76c/0x7c8
[161354.171980]        [<000003ff80026ca8>] vhost_vsock_handle_tx_kick+0x3c0/0x488 [vhost_vsock]
[161354.171983]        [<000003ff80740fe6>] vhost_worker+0x1ae/0x208 [vhost]
[161354.171987]        [<0000000000164cda>] kthread+0x112/0x120
[161354.171992]        [<000000000095855a>] kernel_thread_starter+0x6/0xc
[161354.171994]        [<0000000000958554>] kernel_thread_starter+0x0/0xc
[161354.171996] 
                other info that might help us debug this:

[161354.171998]  Possible unsafe locking scenario:

[161354.172000]        CPU0                    CPU1
[161354.172001]        ----                    ----
[161354.172002]   lock(&vq->mutex);
[161354.172005]                                lock(sk_lock-AF_VSOCK);
[161354.172007]                                lock(&vq->mutex);
[161354.172009]   lock(sk_lock-AF_VSOCK);
[161354.172012] 
                 *** DEADLOCK ***

[161354.172014] 1 lock held by vhost-62642/62645:
[161354.172015]  #0:  (&vq->mutex){+.+.+.}, at: [<000003ff80026944>] vhost_vsock_handle_tx_kick+0x5c/0x488 [vhost_vsock]
[161354.172020] 
                stack backtrace:
[161354.172023] CPU: 0 PID: 62645 Comm: vhost-62642 Not tainted 4.5.0-rc1-309358-g1008150-dirty #1
[161354.172025]        00000003b527b7e0 00000003b527b870 0000000000000002 0000000000000000 
                       00000003b527b910 00000003b527b888 00000003b527b888 000000000011449c 
                       0000000000000000 0000000000ba2c56 0000000000bc4d6c 000000000000000b 
                       00000003b527b8d0 00000003b527b870 0000000000000000 0000000000000000 
                       0000000000000000 000000000011449c 00000003b527b870 00000003b527b8d0 
[161354.172042] Call Trace:
[161354.172048] ([<00000000001143a8>] show_trace+0x130/0x150)
[161354.172050]  [<0000000000114452>] show_stack+0x8a/0xe8
[161354.172057]  [<000000000066d762>] dump_stack+0x7a/0xd0
[161354.172059]  [<00000000001a23bc>] print_circular_bug+0x314/0x340
[161354.172061]  [<00000000001a3350>] check_prev_add+0x160/0x788
[161354.172063]  [<00000000001a7fde>] __lock_acquire+0x1476/0x1858
[161354.172065]  [<00000000001a8c72>] lock_acquire+0x212/0x280
[161354.172066]  [<0000000000806fd4>] lock_sock_nested+0x9c/0xb8
[161354.172068]  [<000000000094a984>] virtio_transport_recv_pkt+0x76c/0x7c8
[161354.172070]  [<000003ff80026ca8>] vhost_vsock_handle_tx_kick+0x3c0/0x488 [vhost_vsock]
[161354.172073]  [<000003ff80740fe6>] vhost_worker+0x1ae/0x208 [vhost]
[161354.172075]  [<0000000000164cda>] kthread+0x112/0x120
[161354.172076]  [<000000000095855a>] kernel_thread_starter+0x6/0xc
[161354.172078]  [<0000000000958554>] kernel_thread_starter+0x0/0xc




v4:
* patch 3 removed, since it was accepted upstream (net-next). It corrected
  some sleep-while-wait problem in AF_VSOCK (af_vsock.c).
* removed useless initialization of vq->is_le in vsock.c
v3:
* improved cover letter.
* cleaned up and streamlined the error cleanup gotos in the 3rd patch,
  they should be more readable now.
* added a comment in the 4th patch to explain why we are ignoring the
  return values
* statically initialize the spinlock and remove the now useless
  virtio_transport_common_init
v2:
* removed pointless blank lines
* replaced an almost pointless call to vhost_init_used() with an explicit
  assignment  vq->is_le = virtio_legacy_is_little_endian();

Claudio Imbrenda (6):
  virtio-vsock vhost device: fixed endiannes issue
  virtio-vsock: Avoid sleeping where not allowed
  virtio/vhost-vsock: avoid potential deadlock
  virtio/vhost-vsock: Use nested lock notation
  virtio/vhost-vsock: Avoid lockdep warning
  AF_VSOCK: Silence lockep error

 drivers/vhost/vsock.c                   |   1 +
 include/linux/virtio_vsock.h            |   4 +-
 net/vmw_vsock/af_vsock.c                |   2 +-
 net/vmw_vsock/virtio_transport_common.c | 142 ++++++++++++++++++++++++++------
 4 files changed, 120 insertions(+), 29 deletions(-)

-- 
1.9.1



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]