Skip to content

cooperative yield in ThreadPool::install() causes unexpected behavior in nested pools #1105

@benkay86

Description

@benkay86

It appears calls to ThreadPool::install(op) will start running op inside the child thread pool and then cooperatively yield, potentially switching to a different task. This causes unexpected behavior with nested iterators using thread pools. It would be nice if rayon didn't yield on ThreadPool::install(), or if the authors believe that it should yield, to place some kind of warning in the documentation for ThreadPool::install().

Consider the following example. You have an outer loop that performs some operation that requires a lot of memory. You have an inner loop that uses a lot of cpu but is memory-efficient. To avoid unbounded memory growth, you will run the outer and inner loops in separate thread pools where ncpus_outer is a small number and ncpus_inner is a big number.

let pool_outer = ThreadPoolBuilder::default().num_threads(ncpus_outer).build().unwrap();
let pool_inner = ThreadPoolBuilder::default().num_threads(ncpus_inner).build().unwrap();
pool_outer.install(|| {
    some_par_iter.for_each({ |foo|
        println!("Entered outer loop.");
        let bar = memory_intensive_op(foo);
        pool_inner.install(|| {
            println!("Entered inner pool.");
            thing.into_par_iter().for_each({ |baz|
                cpu_intensive_op(baz);
            });
        });
        println!("Left inner pool.");
    });
});

Suppose ncpus_outer is equal to one. You might reasonably expect that only one bar will be allocated at any given time, giving you control over memory use. However, the output of the program may actually look like:

Entered outer loop.
Entered inner pool.
Entered outer loop.
Entered inner pool.
Entered outer loop.
Entered outer loop.
Left inner pool.
...

This is possible because, rather than blocking until pool_inner.install() returns, the outer thread pool will yield and cooperatively schedule another iteration of the outer loop on the same thread. This gives you essentially zero control over how many instances of bar will be allocated at a given time.

This is an even bigger problem if you have to do some kind of synchronization (yes there are real uses cases for this):

let pool_outer = ThreadPoolBuilder::default().num_threads(ncpus_outer).build().unwrap();
let pool_inner = ThreadPoolBuilder::default().num_threads(ncpus_inner).build().unwrap();
let mutex = std::sync::Mutex::new(());
pool_outer.install(|| {
    some_par_iter.for_each({ |foo|
        println!("Entered outer loop.");
        let bar = memory_intensive_op(foo);
        let lock = mutex.lock();
        pool_inner.install(|| {
            println!("Entered inner pool.");
            thing.into_par_iter().for_each({ |baz|
                cpu_intensive_op(baz);
            });
        });
        println!("Left inner pool.");
        // Mutex lock released here.
    });
});
Entered outer loop.
Entered inner pool.
Entered outer loop.
[ deadlock ]

The outer thread pool yields to a different task before releasing the mutex, likely leading to a deadlock unless it switches back to the task holding the mutex before it runs out of threads in the outer pool.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions