Skip to content

Consider mixing Tokio & Rayon #25

@JackKelly

Description

@JackKelly

This idea is very early! Right now, I only have a very fuzzy grasp of what I'm trying to achieve, and a fuzzy grasp of how that might be implemented! I'll use this issue to collect links and ideas.

Background and context

At first glance, Tokio should be used for async IO (like building a web server). And Rayon should be used for running CPU-intensive tasks in parallel. The issue is that light-speed-io wants to do both those tasks: loading data from huge numbers of files using non-blocking IO (using io_uring on Linux), and also doing lots of CPU-intensive processing on those files (like decompressing in parallel).

The use-case that motivates me to think again about using Tokio with Rayon

Jacob described a use-case, which I summarised in #24. The basic idea is: say we have millions of small files, and we want to combine these files to create a "re-chunked" version of this dataset on disk. In Jacob's example, there are 146,000 GRIB2 files per NWP init time, and all these files need to be saved into a handful of Zarr chunks. Each GRIB2 file has to be decompressed; and then the decompressed files have to be combined and sliced up again, and then those slices are compressed and saved back to disk.

The broad shape of a possible solution

  • change LSIO's API so that users can group IO operations together. For the GRIB2 example, users would say to LSIO: "Group1 consists of the 146,000 GRIB files that must be combined into a handful of Zarr chunks. Group2 consists of the next 146,000 GRIB files.".
  • Users can optionally provide a map function to be applied (in parallel) to each GRIB2 file (to decompress it)
  • Users provide a reduce_group function which receives all the GRIB buffers in that group. This outputs a vector of buffers (and paths and byte_ranges).
  • Users can provide another map function that'll be applied in parallel to these buffers (to compress them).
  • The compressed buffers are written to storage. The output storage system might be different to the input storage system. For example, we might be reading GRIB files from a cloud storage bucket, and writing Zarr chunks to a local SSD.

How Rust async might help

Hopefully users could write code like:

UPDATE: I need to learn more about Rust's Streams (async iterators).

// LSIO will create a thread which owns an io_uring instance, and keeps the SQ topped up.
// LSIO will read group_1 before starting to read group_2.
// `read_groups` will return an Stream of Streams (!).
let groups =  reader.read_groups([group_1_filenames, group_2_filenames]);

for group in groups {
    // Concurrently decompress items in group
    let handles = Vec::with_capacity(group.len());
    for item in group {
        let handle = tokio::spawn(async move {
            item.buffer = decompress_async(item.buffer).await;
            item
        };
        handles.push(handle);
    }

    // Wait for all buffers to be decompressed.
    // (I'm not sure if this is valid Rust!)
    let items = handles.iter().map(|handle| handle.await.unwrap()).collect();

    // Reduce all items in this group:
    let combined_items = combine_async(items).await;

    for item in combined_items {
        tokio::spawn(async move {
            let item.buffer = compress_async(item.buffer).await;
            // TODO: Use Sink to write data:
            writer.submit_write(item).await;
        };
     }
}

// Adapted from https://ryhl.io/blog/async-what-is-blocking/#the-rayon-crate
async fn decompress_async(buffer: [u8]) -> [u8] {
    let (send, recv) = tokio::sync::oneshot::channel();

    // Spawn a task on rayon.
    rayon::spawn(move || {
        // Perform an expensive computation.
        let decompressed_buffer = decompress(buffer)
        // Send the result back to Tokio.
        let _ = send.send(decompressed_buffer);
    });

    // Wait for the rayon task.
    recv.await.expect("Panic in rayon::spawn")
}

Links

Tutorials / blog posts

  • Alice Ryhl's December 2020 blog post "Async: What is blocking?". Gives a great example of how to use Rayon with Tokio. I shouldn't use Tokio's spawn_blocking. spawn_blocking is best suited for wrapping blocking IO, not for CPU-intensive tasks. Instead, use rayon::spawn with tokio::sync::oneshot (see Alice's blog for more details).

Rust crates & PRs

  • Rayon draft PR: [WIP] Add the ability to spawn futures. Very relevant discussion. But the discussion largely halted in 2020 (with the PR still in "draft").
  • Rayon issue: "Using [Rayon's] ThreadPool for blocking I/O?". Conclusion: Don't wait on blocking IO in a Rayon task, because that blocks that thread from participating in other Rayon tasks.
  • tokio-rayon: "Mix async code with CPU-heavy thread pools using Tokio + Rayon". Last release was in 2021.
  • The futures crate. Contains Streams (which LSIO could use as the source of data. Streams are async iterators.), and Sinks (for writing data).
  • async_stream "Provides two macros, stream! and try_stream!, allowing the caller to define asynchronous streams of elements. These are implemented using async & await notation." Allows us to define streams using a very similar approach to Python, using yield. Also allows us to write for await value in input to implement one stream from another stream.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions