coll: Circulant graph queued variable ring bcast algorithm #7539

mjwilkins18 · 2025-08-18T14:51:12Z

This algorithm achieves better performance than existing bcast algorithms for both small and large message sizes.

The algorithm is based on the circulant graph abstraction and Jesper Larsson Traff's recent paper: https://dl.acm.org/doi/full/10.1145/3735139. It creates communication schedules around various rings in the circulant graph, then repeats the schedule to pipeline message chunks. We introduce a FIFO queue for overlapping sends and receives across communication rounds, which particularly benefits small messages.

In the graph below, we show the algorithm's performance for a fixed chunk size (256k) and queue length (24) for various scales on ANL Aurora (N, PPN). The baseline for this graph is the best-performing algorithm currently in MPICH, so all speedups represent improvements over all algorithms currently in the library. We note that the performance drops around our selected chunk size (256k). By tuning the chunk size near this message size, it is possible to achieve a speedup across all message sizes for all scales.

Pull Request Description

Author Checklist

Provide Description
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form: module: short description
Commit message explains what's in the commit.
Passes All Tests
Whitespace checker. Warnings test. Additional tests via comments.
Contribution Agreement
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.

raffenet · 2025-08-18T15:00:04Z

src/mpi/coll/bcast/bcast_intra_circ_qvring.c

+static int all_blocks(int r, int r_, int s, int e, int k, int* buffer, struct sched_args_t* args);
+static void gen_rsched(int r, int* buffer, struct sched_args_t* args);
+static void gen_ssched(int r, struct sched_args_t* args);
+
+static int get_baseblock(int r, struct sched_args_t* args);


please use more descriptive variable names to make this easier to understand. aside from iterator variables (i,j,k), IMO we should avoid single letter naming.

I agree in general. In this case, the single letter variables align with single variable variables in the paper. The algorithm is complex, meaning readers will likely need to reference the paper to fully understand it, and changing the variable names may make this more difficult. Shall I change them anyways?

I think short parameter names in local static functions can be relaxed if -

the names are consistent throughout the file. For example, if r here consistently refers to rank and only refers to rank.

To help, one can add a comment block listing common local variable names and its purposes.

I could live with a comment explaining the names. r_ in combination with r particularly offends me but I'll wait for the explainer before getting too riled up 😅.

I added variable glossary comments to each of the helper functions and changed r_ to r_prime to avoid riling up @raffenet 😄

This algorithm achieves better performance than existing algorithms for both small and large message sizes. The algorithms is based on the circulant graph abstraction and Jesper Larsson Traff's recent paper: https://dl.acm.org/doi/full/10.1145/3735139. It creates communication schedules around various rings in the circulant graph, then repeats the schedule to pipeline message chunks. We introduce a FIFO queue for overlapping sends and receives across communication rounds, which particularly benefits small messages. In the graph below, we show the algorithm's performance for a fixed chunk size (256k) and queue length (24) for various scales on ANL Aurora (N, PPN). The baseline for this graph is the best-performing algorithm currently in MPICH, so all speedups represent improvements over all algorithms currently in the library. We note that the performance drops around our selected chunk size (256k). By tuning the chunk size near this message size, it is possible to achieve a speedup across all message sizes for all scales.

mjwilkins18 force-pushed the cga_bcast branch from f617051 to 24a5d53 Compare August 18, 2025 14:51

mjwilkins18 requested review from hzhou, raffenet and yfguo August 18, 2025 14:53

mjwilkins18 self-assigned this Aug 18, 2025

mjwilkins18 requested a review from seenry August 18, 2025 14:53

raffenet reviewed Aug 18, 2025

View reviewed changes

mjwilkins18 force-pushed the cga_bcast branch from 24a5d53 to 06954fa Compare August 18, 2025 18:29

mjwilkins18 requested a review from raffenet August 18, 2025 18:30

mjwilkins18 force-pushed the cga_bcast branch from 06954fa to 5d21f1d Compare August 19, 2025 14:40

mjwilkins18 force-pushed the cga_bcast branch from 5d21f1d to 3e871f0 Compare August 19, 2025 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

coll: Circulant graph queued variable ring bcast algorithm #7539

coll: Circulant graph queued variable ring bcast algorithm #7539

Uh oh!

mjwilkins18 commented Aug 18, 2025

Uh oh!

raffenet Aug 18, 2025

Uh oh!

mjwilkins18 Aug 18, 2025

Uh oh!

hzhou Aug 18, 2025

Uh oh!

raffenet Aug 18, 2025

Uh oh!

mjwilkins18 Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coll: Circulant graph queued variable ring bcast algorithm #7539

Are you sure you want to change the base?

coll: Circulant graph queued variable ring bcast algorithm #7539

Uh oh!

Conversation

mjwilkins18 commented Aug 18, 2025

Pull Request Description

Author Checklist

Uh oh!

raffenet Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

mjwilkins18 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

hzhou Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

raffenet Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

mjwilkins18 Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants