-
Notifications
You must be signed in to change notification settings - Fork 117
TL/UCP: congestion avoidant onesided a2a #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
34fb46f to
96449db
Compare
96449db to
eaa8091
Compare
|
@wfaderhold21 didn't we say we were also going to change the test to reflect oshmem behavior? |
|
@wfaderhold21 |
@janjust This is correct. In order to ensure completion of writes to the remote processes, we need to issue a flush. |
Does the flush become a no-op (or just unnecessary) if RC is used? I'm just wondering how the transport changes this requirement (if at all) |
I believe ordering should be maintained if using RC and a flush is not necessarily required as future PUTs, sends, AMOs should be completed after the PUT, but UCP will return with success on a PUT if only the source buffer is ready for reuse. There's no guarantee that the PUT has been completed at the remote target (e.g., buffered copy). |
c347fe0 to
68dd03e
Compare
|
@janjust @Sergei-Lebedev @nsarka I believe this is finally in a reviewable state |
db05851 to
b6ca39b
Compare
|
@Sergei-Lebedev Any comments on this? Let's push this out. |
611c406 to
9f37f24
Compare
|
@janjust @Sergei-Lebedev @samnordmann @ikryukov I believe I have addressed the feedback in this PR. Please let me know if you have any other changes you wish to see. |
|
ok to test |
|
bot:retest |
|
@wfaderhold21 |
|
@Sergei-Lebedev I believe the changes I made should fix this error and ensure correctness for the PUT-based algorithm |
REVIEW: performance fixes REVIEW: cleanup
ef26c88 to
b6c2f51
Compare
b6c2f51 to
2e88c46
Compare
TL/UCP: update ep flush with cb
What
Switch from using pSync array with atomic increment to TL/UCP barrier for synchronization
Why ?
There are multiple reason to switch to this: knomial barrier scales better and has better performance than atomic increment (see below) and, when PR #1070 is merged, this allows usage of this algorithm with memory handles.
Node Bandwidth
Tested on Thor with 32 nodes 1 PPN
Tested on Thor with 32 nodes 32 PPN