-
Notifications
You must be signed in to change notification settings - Fork 150
Description
Hi,
tcmu ver:1.4.1
OS ver:centos7.6 kernel 3.10.0.957
We found a deadlock when use tcmu-runner like below:
Sep 20 03:36:09 b12 kernel: INFO: task tcmu-runner:2463938 blocked for more than 120 seconds.
Sep 20 03:36:09 b12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 20 03:36:09 b12 kernel: tcmu-runner D ffff892eabffc100 0 2463938 1 0x00000080
Sep 20 03:36:09 b12 kernel: Call Trace:
Sep 20 03:36:09 b12 kernel: [] ? security_inode_permission+0x22/0x30
Sep 20 03:36:09 b12 kernel: [] schedule_preempt_disabled+0x29/0x70
Sep 20 03:36:09 b12 kernel: [] __mutex_lock_slowpath+0xc7/0x1d0
Sep 20 03:36:09 b12 kernel: [] mutex_lock+0x1f/0x2f
Sep 20 03:36:09 b12 kernel: [] lookup_slow+0x33/0xa7
Sep 20 03:36:09 b12 kernel: [] path_lookupat+0x838/0x8b0
Sep 20 03:36:09 b12 kernel: [] ? try_to_wake_up+0x190/0x390
Sep 20 03:36:09 b12 kernel: [] ? kmem_cache_alloc+0x35/0x1f0
Sep 20 03:36:09 b12 kernel: [] ? getname_flags+0x4f/0x1a0
Sep 20 03:36:09 b12 kernel: [] filename_lookup+0x2b/0xc0
Sep 20 03:36:09 b12 kernel: [] user_path_at_empty+0x67/0xc0
Sep 20 03:36:09 b12 kernel: [] ? kmemdup+0x36/0x50
Sep 20 03:36:09 b12 kernel: [] user_path_at+0x11/0x20
Sep 20 03:36:09 b12 kernel: [] SyS_faccessat+0xb2/0x230
Sep 20 03:36:09 b12 kernel: [] SyS_access+0x18/0x20
Sep 20 03:36:09 b12 kernel: [] system_call_fastpath+0x22/0x27
Sep 20 03:36:09 b12 kernel: INFO: task rbd-target-api:2463993 blocked for more than 120 seconds.
Sep 20 03:36:09 b12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 20 03:36:10 b12 kernel: rbd-target-api D ffff892efdfeb0c0 0 2463993 1 0x00000080
Sep 20 03:36:10 b12 kernel: Call Trace:
Sep 20 03:36:10 b12 kernel: [] ? __wake_up_sync_key+0x4f/0x60
Sep 20 03:36:10 b12 kernel: [] schedule+0x29/0x70
Sep 20 03:36:10 b12 kernel: [] schedule_timeout+0x221/0x2d0
Sep 20 03:36:10 b12 kernel: [] ? netlink_broadcast_filtered+0x14c/0x3e0
Sep 20 03:36:10 b12 kernel: [] wait_for_completion+0xfd/0x140
Sep 20 03:36:10 b12 kernel: [] ? wake_up_state+0x20/0x20
Sep 20 03:36:10 b12 kernel: [] tcmu_netlink_event+0x334/0x4a0 [target_core_user]
Sep 20 03:36:10 b12 kernel: [] ? __cond_resched+0x26/0x30
Sep 20 03:36:10 b12 kernel: [] tcmu_destroy_device+0x5c/0x90 [target_core_user]
Sep 20 03:36:10 b12 kernel: [] target_free_device+0xb4/0x120 [target_core_mod]
Sep 20 03:36:10 b12 kernel: [] target_core_dev_release+0x15/0x20 [target_core_mod]
Sep 20 03:36:10 b12 kernel: [] config_item_release+0x6a/0xf0
Sep 20 03:36:10 b12 kernel: [] config_item_put+0x2c/0x30
Sep 20 03:36:10 b12 kernel: [] configfs_rmdir+0x1eb/0x310
Sep 20 03:36:10 b12 kernel: [] vfs_rmdir+0xdc/0x150
Sep 20 03:36:10 b12 kernel: [] do_rmdir+0x1f1/0x220
Sep 20 03:36:10 b12 kernel: [] ? SYSC_newstat+0x3e/0x60
Sep 20 03:36:10 b12 kernel: [] SyS_rmdir+0x16/0x20
Sep 20 03:36:10 b12 kernel: [] system_call_fastpath+0x22/0x27
Because the process is D status, can’t trace user mode stack,We use crash to find out which lock tcmu-runner want to request in kernel mode.It is a vfs dir lock, dir path is /sys/kernel/config/target/core/user_1/[lun_name]/,and tcmu runner request action file in this dir.
We think tcmu-runner in below code:
int tcmu_acquire_dev_lock(struct tcmu_device *dev, bool is_sync,
uint16_t tag)
...
pthread_mutex_lock(&rdev->state_lock);
if (ret == TCMU_STS_OK)
rdev->lock_state = TCMUR_DEV_LOCK_LOCKED;
else
rdev->lock_state = TCMUR_DEV_LOCK_UNLOCKED;
tcmu_dev_dbg(dev, "lock call done. lock state %d\n", rdev->lock_state);
tcmu_unblock_device(dev);pthread_cond_signal(&rdev->lock_cond);
pthread_mutex_unlock(&rdev->state_lock);return ret;
- Tcmu-runner hold state_lock and exec unblock_device, in this func will call access syscall.We call it A.
- In another hand, rbd-target-api exec rmdir, it’s hold LUN_NAME dir lock, callback TCMU_CMD_REMOVED_DEVICE through netlink, and wait completion when use v2 protocol.We call it B
- Tcmu-runner main loop receive netlink event, exec removed_device, but can’t get state_lock because tcmu acquire lock thread A hold the state lock and wait vfs dir lock.But the rmdir kernel path B doesn’t release dir lock until it get completion.We call it C.
The interesting thing is why A access syscall enter lookup_slow and request dir lock at last, because kernel will cache action(file) dentry after first lookup.We sure that in our system deadlock didn’t occur in first access action file, access syscall can get this cache dentry through lookup_fast usually.We think that is B rmdir(LUN_NAME), before it delete dir dentry, it will delete child file dentry cache in this dir by use shrink_dcache_parent.
int vfs_rmdir(struct inode *dir, struct dentry *dentry)
{
...
shrink_dcache_parent(dentry);
error = dir->i_op->rmdir(dir, dentry);
if (error)
goto out;dentry->d_inode->i_flags |= S_DEAD;
dont_mount(dentry);
detach_mounts(dentry);out:
mutex_unlock(&dentry->d_inode->i_mutex);
dput(dentry);
if (!error)
d_delete(dentry);
return error;
We guess in this deadlock scene, B remove action file dentry cache and hold dir lock(LUN_NAME), so A can’t get it by call lookup_fast.Then A request dir lock in lookup_slow func and hold state mutex lock.C wait mutext lock to complete REMOVE DEV and let B release dir lock.
The newest code remove access syscall, and only exec open and write value to /sys/kernel/.../action file.But open syscall also have the opportunity to enter the lookup_slow func, so a deadlock scenario may exist.