TL/UCP: congestion avoidant onesided a2a #1096

wfaderhold21 · 2025-03-17T23:01:25Z

What

Switch from using pSync array with atomic increment to TL/UCP barrier for synchronization

Why ?

There are multiple reason to switch to this: knomial barrier scales better and has better performance than atomic increment (see below) and, when PR #1070 is merged, this allows usage of this algorithm with memory handles.

Node Bandwidth
Tested on Thor with 32 nodes 1 PPN

Size	This PR	Current Algorithm
8	15.62	9.3
16	30.99	18.27
32	61.81	35.65
64	122.12	72.89
128	235.66	145.33
256	478.43	279.93
512	918.63	549.35
1024	1663.85	1022.45
2048	2919.6	1852.31
4096	4571.7	3151.6
8192	6527.59	4753.05
16384	8583.94	7749.33
32768	10381.33	9651.23
65536	11574.64	10882.64
131072	12039.43	11585.79
262144	12456.95	12065.97
524288	12151.77	12350.47
1048576	12785.3	12427.42

Tested on Thor with 32 nodes 32 PPN

Size	This PR	Current Algorithm
8	93.92	37.46
16	188.45	72.11
32	380.5	110.44
64	754.7	301.35
128	1500.11	460.3
256	2190.99	917.62
512	4178.51	1982.53
1024	7749.3	2867.44
2048	9093.23	4287.41
4096	9529.68	7078.85
8192	9858.2	8398.87
16384	9615.04	9826.92
32768	10004.92	10975.51
65536	10021.39	11901.37
131072	11287.29	11982.72
262144	11635.63	11803.15
524288	11623.31	11894.45
1048576	11621.41	11991.38

swx-jenkins3 · 2025-03-17T23:06:05Z

Can one of the admins verify this patch?

janjust · 2025-03-26T17:07:25Z

@wfaderhold21 didn't we say we were also going to change the test to reflect oshmem behavior?
edit: nvm, I just realized it's the other PR

janjust · 2025-04-09T15:24:25Z

@wfaderhold21
(2) there can be instances where processes leave the alltoall collective before remote writes have been completed.
We had a discussion during our code review, and if I recall we concluded this is not the case in a 2-sided model, correct?
We still need the user to issue a flush on the symmetric heap
Please correct me if I misunderstood.

wfaderhold21 · 2025-04-09T20:34:30Z

@wfaderhold21 (2) there can be instances where processes leave the alltoall collective before remote writes have been completed. We had a discussion during our code review, and if I recall we concluded this is not the case in a 2-sided model, correct? We still need the user to issue a flush on the symmetric heap Please correct me if I misunderstood.

@janjust This is correct. In order to ensure completion of writes to the remote processes, we need to issue a flush.

nsarka · 2025-04-10T15:27:15Z

In order to ensure completion of writes to the remote processes, we need to issue a flush.

Does the flush become a no-op (or just unnecessary) if RC is used? I'm just wondering how the transport changes this requirement (if at all)

wfaderhold21 · 2025-04-10T20:31:09Z

In order to ensure completion of writes to the remote processes, we need to issue a flush.

Does the flush become a no-op (or just unnecessary) if RC is used? I'm just wondering how the transport changes this requirement (if at all)

I believe ordering should be maintained if using RC and a flush is not necessarily required as future PUTs, sends, AMOs should be completed after the PUT, but UCP will return with success on a PUT if only the source buffer is ready for reuse. There's no guarantee that the PUT has been completed at the remote target (e.g., buffered copy).

wfaderhold21 · 2025-07-09T05:57:35Z

@janjust @Sergei-Lebedev @nsarka I believe this is finally in a reviewable state

manjugv · 2025-09-11T15:58:33Z

@Sergei-Lebedev Any comments on this? Let's push this out.

src/components/tl/ucp/alltoall/alltoall_onesided.c

wfaderhold21 · 2025-10-01T16:21:29Z

@janjust @Sergei-Lebedev @samnordmann @ikryukov I believe I have addressed the feedback in this PR. Please let me know if you have any other changes you wish to see.

src/components/tl/ucp/alltoall/alltoall_onesided.c

Sergei-Lebedev · 2025-10-08T06:38:02Z

ok to test

dpressle · 2025-10-08T11:25:57Z

bot:retest

Sergei-Lebedev · 2025-10-08T13:54:18Z

@wfaderhold21
seems like relevant issue in CI

14:59:58  ===== UCC MPI TEST INFO =======
14:59:58  seed:         24653
14:59:58  collectives:  Barrier, Bcast, Reduce, Allreduce, Allgather, Allgatherv, Alltoall, Alltoallv, Reduce_scatter, Reduce_scatterv, Gather, Gatherv, Scatter, Scatterv
14:59:58  data types:   int16, int32, int64, uint16, uint32, uint64, float32, float64, float64_complex
14:59:58  memory types: Host, Cuda
14:59:58  teams:        world, reverse, half, odd_even
15:00:44  FAILURE in: tc=Onesided Alltoall team=half mtype=host msgsize=512 persistent=0 local_registration=0 dt=int16
15:00:44  
15:00:44  ===== UCC MPI TEST REPORT =====
15:00:44  collective                 tests    passed    failed   skipped
15:00:44  Allgather                   1008       936         0        72
15:00:44  Allgatherv                  1008       936         0        72
15:00:44  Allreduce                   2240      2240         0         0
15:00:44  Alltoall                     756       702         1        54
15:00:44  Alltoallv                   3024      2376         0       648
15:00:44  Barrier                        4         4         0         0
15:00:44  Bcast                       1008      1008         0         0
15:00:44  Gather                      2016      1872         0       144
15:00:44  Gatherv                     2016      1872         0       144
15:00:44  Reduce                      4480      4480         0         0
15:00:44  Reduce_scatter              2240      1656         0       584
15:00:44  Reduce_scatterv             2240      1904         0       336
15:00:44  Scatter                     2016        32         0      1984
15:00:44  Scatterv                    2016      1872         0       144
15:00:44   
15:00:44  ===== UCC MPI TEST SUMMARY =====
15:00:44  total tests:  26072
15:00:44  passed:       21890
15:00:44  skipped:      4182
15:00:44  failed:       1
15:00:44  elapsed:      48s

wfaderhold21 · 2025-10-09T06:11:59Z

@Sergei-Lebedev I believe the changes I made should fix this error and ensure correctness for the PUT-based algorithm

REVIEW: performance fixes REVIEW: cleanup

TL/UCP: update ep flush with cb

wfaderhold21 force-pushed the topic/a2a-barrier branch from 34fb46f to 96449db Compare March 19, 2025 16:44

wfaderhold21 added the Ready-for-Review label Mar 19, 2025

wfaderhold21 requested review from Sergei-Lebedev, janjust and samnordmann March 19, 2025 16:45

janjust requested a review from ikryukov March 20, 2025 16:18

janjust force-pushed the topic/a2a-barrier branch from 96449db to eaa8091 Compare March 26, 2025 17:06

janjust approved these changes Mar 26, 2025

View reviewed changes

wfaderhold21 mentioned this pull request Apr 30, 2025

TL/UCP: add cong. avoidant onesided alltoall #1107

Closed

wfaderhold21 removed the Ready-for-Review label May 21, 2025

wfaderhold21 added the Ready-for-Review label Jul 1, 2025

wfaderhold21 force-pushed the topic/a2a-barrier branch 2 times, most recently from c347fe0 to 68dd03e Compare July 9, 2025 05:16

wfaderhold21 requested a review from janjust July 9, 2025 05:56

wfaderhold21 removed the Ready-for-Review label Jul 30, 2025

wfaderhold21 force-pushed the topic/a2a-barrier branch from db05851 to b6ca39b Compare August 18, 2025 16:31

wfaderhold21 added the Ready-for-Review label Aug 18, 2025

wfaderhold21 added the Blocker-for-rel-branch label Sep 10, 2025

wfaderhold21 changed the title ~~TL/UCP: transition to barrier for sync for onesided a2a~~ TL/UCP: congestion aovidant onesided a2a Sep 10, 2025

wfaderhold21 changed the title ~~TL/UCP: congestion aovidant onesided a2a~~ TL/UCP: congestion avoidant onesided a2a Sep 10, 2025

wfaderhold21 commented Sep 25, 2025

View reviewed changes

src/components/tl/ucp/alltoall/alltoall_onesided.c Outdated Show resolved Hide resolved

wfaderhold21 commented Sep 25, 2025

View reviewed changes

src/components/tl/ucp/alltoall/alltoall_onesided.c Show resolved Hide resolved

wfaderhold21 commented Sep 25, 2025

View reviewed changes

src/components/tl/ucp/alltoall/alltoall_onesided.c Outdated Show resolved Hide resolved

wfaderhold21 commented Sep 25, 2025

View reviewed changes

src/components/tl/ucp/alltoall/alltoall_onesided.c Outdated Show resolved Hide resolved

wfaderhold21 force-pushed the topic/a2a-barrier branch from 611c406 to 9f37f24 Compare September 30, 2025 22:49

wfaderhold21 added the Code-Review-Completed label Oct 2, 2025

Sergei-Lebedev reviewed Oct 3, 2025

View reviewed changes

Sergei-Lebedev reviewed Oct 7, 2025

View reviewed changes

src/components/tl/ucp/alltoall/alltoall_onesided.c Show resolved Hide resolved

src/components/tl/ucp/alltoall/alltoall_onesided.c Outdated Show resolved Hide resolved

Sergei-Lebedev approved these changes Oct 9, 2025

View reviewed changes

janjust approved these changes Oct 9, 2025

View reviewed changes

wfaderhold21 added 6 commits October 9, 2025 14:16

TL/UCP: move onesided a2a sync to barrier

745238b

REVIEW: performance fixes REVIEW: cleanup

REVIEW: address feedback

7889a95

REVIEW: address feedback

8b37ad5

REVIEW: address feedback

1e5d690

REVIEW: clang-tidy

3865aab

REVIEW: address feedback

72e3ffe

janjust force-pushed the topic/a2a-barrier branch from ef26c88 to b6c2f51 Compare October 9, 2025 19:16

wfaderhold21 force-pushed the topic/a2a-barrier branch from b6c2f51 to 2e88c46 Compare October 10, 2025 05:09

REVIEW: address feedback

2e88c46

TL/UCP: update ep flush with cb

manjugv merged commit b3ea442 into openucx:master Oct 10, 2025
8 of 9 checks passed

TL/UCP: congestion avoidant onesided a2a #1096

TL/UCP: congestion avoidant onesided a2a #1096

Uh oh!

Conversation

wfaderhold21 commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why ?

Uh oh!

swx-jenkins3 commented Mar 17, 2025

Uh oh!

janjust commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janjust commented Apr 9, 2025

Uh oh!

wfaderhold21 commented Apr 9, 2025

Uh oh!

nsarka commented Apr 10, 2025

Uh oh!

wfaderhold21 commented Apr 10, 2025

Uh oh!

wfaderhold21 commented Jul 9, 2025

Uh oh!

manjugv commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wfaderhold21 commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sergei-Lebedev commented Oct 8, 2025

Uh oh!

dpressle commented Oct 8, 2025

Uh oh!

Sergei-Lebedev commented Oct 8, 2025

Uh oh!

wfaderhold21 commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wfaderhold21 commented Mar 17, 2025 •

edited

Loading

janjust commented Mar 26, 2025 •

edited

Loading