Deadlock between remove device and acquire dev lock.

Hi,
tcmu ver:1.4.1
OS ver:centos7.6 kernel 3.10.0.957

We found a deadlock when use tcmu-runner like below:
> Sep 20 03:36:09 b12 kernel: INFO: task tcmu-runner:2463938 blocked for more than 120 seconds.
> Sep 20 03:36:09 b12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 20 03:36:09 b12 kernel: tcmu-runner     D ffff892eabffc100     0 2463938      1 0x00000080
> Sep 20 03:36:09 b12 kernel: Call Trace:
> Sep 20 03:36:09 b12 kernel: [<ffffffffbeaf8b92>] ? security_inode_permission+0x22/0x30
> Sep 20 03:36:09 b12 kernel: [<ffffffffbef68ae9>] schedule_preempt_disabled+0x29/0x70
> Sep 20 03:36:09 b12 kernel: [<ffffffffbef66a37>] __mutex_lock_slowpath+0xc7/0x1d0
> Sep 20 03:36:09 b12 kernel: [<ffffffffbef65e1f>] mutex_lock+0x1f/0x2f
> Sep 20 03:36:09 b12 kernel: [<ffffffffbef5ea52>] **lookup_slow**+0x33/0xa7
> Sep 20 03:36:09 b12 kernel: [<ffffffffbea51c38>] path_lookupat+0x838/0x8b0
> Sep 20 03:36:09 b12 kernel: [<ffffffffbe8d64f0>] ? try_to_wake_up+0x190/0x390
> Sep 20 03:36:09 b12 kernel: [<ffffffffbea1bcb5>] ? kmem_cache_alloc+0x35/0x1f0
> Sep 20 03:36:09 b12 kernel: [<ffffffffbea52b2f>] ? getname_flags+0x4f/0x1a0
> Sep 20 03:36:09 b12 kernel: [<ffffffffbea51cdb>] filename_lookup+0x2b/0xc0
> Sep 20 03:36:09 b12 kernel: [<ffffffffbea53cc7>] user_path_at_empty+0x67/0xc0
> Sep 20 03:36:09 b12 kernel: [<ffffffffbe9d5af6>] ? kmemdup+0x36/0x50
> Sep 20 03:36:09 b12 kernel: [<ffffffffbea53d31>] user_path_at+0x11/0x20
> Sep 20 03:36:09 b12 kernel: [<ffffffffbea3f3c2>] SyS_faccessat+0xb2/0x230
> Sep 20 03:36:09 b12 kernel: [<ffffffffbea3f558>] SyS_access+0x18/0x20
> Sep 20 03:36:09 b12 kernel: [<ffffffffbef74ddb>] system_call_fastpath+0x22/0x27
> Sep 20 03:36:09 b12 kernel: INFO: task rbd-target-api:2463993 blocked for more than 120 seconds.
> Sep 20 03:36:09 b12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 20 03:36:10 b12 kernel: rbd-target-api  D ffff892efdfeb0c0     0 2463993      1 0x00000080
> Sep 20 03:36:10 b12 kernel: Call Trace:
> Sep 20 03:36:10 b12 kernel: [<ffffffffbe8cecbf>] ? __wake_up_sync_key+0x4f/0x60
> Sep 20 03:36:10 b12 kernel: [<ffffffffbef67bc9>] schedule+0x29/0x70
> Sep 20 03:36:10 b12 kernel: [<ffffffffbef656a1>] schedule_timeout+0x221/0x2d0
> Sep 20 03:36:10 b12 kernel: [<ffffffffbee736bc>] ? netlink_broadcast_filtered+0x14c/0x3e0
> Sep 20 03:36:10 b12 kernel: [<ffffffffbef67f7d>] **wait_for_completion**+0xfd/0x140
> Sep 20 03:36:10 b12 kernel: [<ffffffffbe8d67b0>] ? wake_up_state+0x20/0x20
> Sep 20 03:36:10 b12 kernel: [<ffffffffc0affbc4>] tcmu_netlink_event+0x334/0x4a0 [target_core_user]
> Sep 20 03:36:10 b12 kernel: [<ffffffffbe8d31b6>] ? __cond_resched+0x26/0x30
> Sep 20 03:36:10 b12 kernel: [<ffffffffc0b00adc>] tcmu_destroy_device+0x5c/0x90 [target_core_user]
> Sep 20 03:36:10 b12 kernel: [<ffffffffc0a9a334>] target_free_device+0xb4/0x120 [target_core_mod]
> Sep 20 03:36:10 b12 kernel: [<ffffffffc0a942d5>] target_core_dev_release+0x15/0x20 [target_core_mod]
> Sep 20 03:36:10 b12 kernel: [<ffffffffbead0d5a>] config_item_release+0x6a/0xf0
> Sep 20 03:36:10 b12 kernel: [<ffffffffbead0e0c>] config_item_put+0x2c/0x30
> Sep 20 03:36:10 b12 kernel: [<ffffffffbeaceefb>] configfs_rmdir+0x1eb/0x310
> Sep 20 03:36:10 b12 kernel: [<ffffffffbea4dc2c>] vfs_rmdir+0xdc/0x150
> Sep 20 03:36:10 b12 kernel: [<ffffffffbea530d1>] do_rmdir+0x1f1/0x220
> Sep 20 03:36:10 b12 kernel: [<ffffffffbea46f8e>] ? SYSC_newstat+0x3e/0x60
> Sep 20 03:36:10 b12 kernel: [<ffffffffbea54306>] SyS_rmdir+0x16/0x20
> Sep 20 03:36:10 b12 kernel: [<ffffffffbef74ddb>] system_call_fastpath+0x22/0x27

Because the process is D status, can’t trace user mode stack,We use crash to find out which lock tcmu-runner want to request in kernel mode.It is a vfs dir lock, dir path is /sys/kernel/config/target/core/user_1/[lun_name]/，and tcmu runner request **action** file in this dir.
We think tcmu-runner in below code:

> int tcmu_acquire_dev_lock(struct tcmu_device *dev, bool is_sync,
>               uint16_t tag)
> ...
>     pthread_mutex_lock(&rdev->state_lock);
>     if (ret == TCMU_STS_OK)
>         rdev->lock_state = TCMUR_DEV_LOCK_LOCKED;
>     else
>         rdev->lock_state = TCMUR_DEV_LOCK_UNLOCKED;
>     tcmu_dev_dbg(dev, "lock call done. lock state %d\n", rdev->lock_state);
>     tcmu_unblock_device(dev);
> 
>     pthread_cond_signal(&rdev->lock_cond);
>     pthread_mutex_unlock(&rdev->state_lock);
> 
>     return ret;

1. Tcmu-runner hold state_lock and exec unblock_device, in this func will call access syscall.We call it A.
2. In another hand, rbd-target-api exec rmdir, it’s hold LUN_NAME dir lock, callback TCMU_CMD_REMOVED_DEVICE through netlink, and wait completion when use v2 protocol.We call it B
3. Tcmu-runner main loop receive netlink event, exec removed_device, but can’t get state_lock because tcmu acquire lock thread A hold the state lock and wait vfs dir lock.But the rmdir kernel path B doesn’t release dir lock until it get completion.We call it C.

The interesting thing is why A access syscall enter lookup_slow and request dir lock at last, because kernel will cache **action**(file) dentry after first lookup.We sure that in our system deadlock didn’t occur in first access **action** file, access syscall can get this cache dentry through lookup_fast usually.We think that is B rmdir（LUN_NAME）, before it delete dir dentry, it will delete child file dentry cache in this dir by use shrink_dcache_parent.

>  int vfs_rmdir(struct inode *dir, struct dentry *dentry)
> {
> ...
>     **shrink_dcache_parent(dentry);**
>     error = dir->i_op->rmdir(dir, dentry);
>     if (error)
>         goto out;
> 
>     dentry->d_inode->i_flags |= S_DEAD;
>     dont_mount(dentry);
>     detach_mounts(dentry);
> 
> out:
>     mutex_unlock(&dentry->d_inode->i_mutex);
>     dput(dentry);
>     if (!error)
>         d_delete(dentry);
>     return error;

We guess in this deadlock scene, B remove **action** file dentry cache and hold dir lock(LUN_NAME), so A can’t get it by call lookup_fast.Then A request dir lock in lookup_slow func and hold state mutex lock.C wait mutext lock to complete REMOVE DEV and let B release dir lock.
The newest code remove access syscall, and only exec open and write value to /sys/kernel/.../action file.But open syscall also have the opportunity to enter the lookup_slow func, so a deadlock scenario may exist.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deadlock between remove device and acquire dev lock. #595

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deadlock between remove device and acquire dev lock. #595

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions