workqueue: fix flush_workqueue() vs CPU_DEAD race
authorOleg Nesterov <oleg@tv-sign.ru>
Wed, 9 May 2007 09:34:07 +0000 (02:34 -0700)
committerLinus Torvalds <torvalds@woody.linux-foundation.org>
Wed, 9 May 2007 19:30:52 +0000 (12:30 -0700)
commitd721304dce0ced0b3b0366996cc02929669708a8
treea7c41390d082e4d2c70b185e508368293cf68b92
parent319c2a986eb45989690c955d9667b814ef0ed56f
workqueue: fix flush_workqueue() vs CPU_DEAD race

Many thanks to Srivatsa Vaddagiri for the helpful discussion and for spotting
the bug in my previous attempt.

work->func() (and thus flush_workqueue()) must not use workqueue_mutex,
this leads to deadlock when CPU_DEAD does kthread_stop(). However without
this mutex held we can't detect CPU_DEAD in progress, which can move pending
works to another CPU while the dead one is not on cpu_online_map.

Change flush_workqueue() to use for_each_possible_cpu(). This means that
flush_cpu_workqueue() may hit CPU which is already dead. However in that
case

!list_empty(&cwq->worklist) || cwq->current_work != NULL

means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work()
so we can proceed and insert a barrier. We hold cwq->lock, so we are safe.

Also, add migrate_sequence incremented by take_over_work() under cwq->lock.
If take_over_work() happened before we checked this CPU, we should see the
new value after spin_unlock().

Further possible changes:

remove CPU_DEAD handling (along with take_over_work, migrate_sequence)
from workqueue.c. CPU_DEAD just sets cwq->please_exit_after_flush flag.

CPU_UP_PREPARE->create_workqueue_thread() clears this flag, and creates
the new thread if cwq->thread == NULL.

This way the workqueue/cpu-hotplug interaction is almost zero, workqueue_mutex
just protects "workqueues" list, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE go away.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Gautham shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel/workqueue.c