-
Notifications
You must be signed in to change notification settings - Fork 557
Description
It appears calls to ThreadPool::install(op)
will start running op
inside the child thread pool and then cooperatively yield, potentially switching to a different task. This causes unexpected behavior with nested iterators using thread pools. It would be nice if rayon didn't yield on ThreadPool::install()
, or if the authors believe that it should yield, to place some kind of warning in the documentation for ThreadPool::install()
.
Consider the following example. You have an outer loop that performs some operation that requires a lot of memory. You have an inner loop that uses a lot of cpu but is memory-efficient. To avoid unbounded memory growth, you will run the outer and inner loops in separate thread pools where ncpus_outer
is a small number and ncpus_inner
is a big number.
let pool_outer = ThreadPoolBuilder::default().num_threads(ncpus_outer).build().unwrap();
let pool_inner = ThreadPoolBuilder::default().num_threads(ncpus_inner).build().unwrap();
pool_outer.install(|| {
some_par_iter.for_each({ |foo|
println!("Entered outer loop.");
let bar = memory_intensive_op(foo);
pool_inner.install(|| {
println!("Entered inner pool.");
thing.into_par_iter().for_each({ |baz|
cpu_intensive_op(baz);
});
});
println!("Left inner pool.");
});
});
Suppose ncpus_outer
is equal to one. You might reasonably expect that only one bar
will be allocated at any given time, giving you control over memory use. However, the output of the program may actually look like:
Entered outer loop.
Entered inner pool.
Entered outer loop.
Entered inner pool.
Entered outer loop.
Entered outer loop.
Left inner pool.
...
This is possible because, rather than blocking until pool_inner.install()
returns, the outer thread pool will yield and cooperatively schedule another iteration of the outer loop on the same thread. This gives you essentially zero control over how many instances of bar
will be allocated at a given time.
This is an even bigger problem if you have to do some kind of synchronization (yes there are real uses cases for this):
let pool_outer = ThreadPoolBuilder::default().num_threads(ncpus_outer).build().unwrap();
let pool_inner = ThreadPoolBuilder::default().num_threads(ncpus_inner).build().unwrap();
let mutex = std::sync::Mutex::new(());
pool_outer.install(|| {
some_par_iter.for_each({ |foo|
println!("Entered outer loop.");
let bar = memory_intensive_op(foo);
let lock = mutex.lock();
pool_inner.install(|| {
println!("Entered inner pool.");
thing.into_par_iter().for_each({ |baz|
cpu_intensive_op(baz);
});
});
println!("Left inner pool.");
// Mutex lock released here.
});
});
Entered outer loop.
Entered inner pool.
Entered outer loop.
[ deadlock ]
The outer thread pool yields to a different task before releasing the mutex, likely leading to a deadlock unless it switches back to the task holding the mutex before it runs out of threads in the outer pool.