-
Notifications
You must be signed in to change notification settings - Fork 308
ch4/wait: Optimize waitall when all requests are local #7620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2f33a6b to
f7218b9
Compare
|
test:mpich/ch4/most |
f7218b9 to
c825241
Compare
|
test:mpich/ch4/most |
src/mpid/ch4/src/ch4_wait.h
Outdated
| state->vci_count = idx; | ||
|
|
||
| if (local_only) { | ||
| state->flag &= ~MPIDI_PROGRESS_NM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still perform netmod progress during "global" progress.
mpich/src/mpid/ch4/src/ch4_progress.h
Line 78 in f47908f
| if (state->flag & MPIDI_PROGRESS_NM && (is_global || !made_progress)) { \ |
Make it
if (is_global || (state->flag & MPIDI_PROGRESS_NM && !made_progress)) { \
c825241 to
3a8cf7e
Compare
|
test:mpich/ch4/most |
95989b8 to
c405c76
Compare
|
test:mpich/ch4/most |
c405c76 to
62ce4f7
Compare
|
test:mpich/ch4/most |
62ce4f7 to
af516b9
Compare
|
test:mpich/ch4/most |
af516b9 to
5f1827a
Compare
|
test:mpich/ch4/most |
5f1827a to
98c5984
Compare
|
test:mpich/ch4/most |
98c5984 to
2c95be5
Compare
|
test:mpich/ch4/most |
|
ch4/ofi vci build crashed when the node it was running on went offline. this is ready for review. |
src/mpid/ch4/src/mpidig_rma.h
Outdated
| int vci_target = MPIDI_WIN_TARGET_VCI(win, target_rank); | ||
| sreq = MPIDIG_request_create(MPIR_REQUEST_KIND__RMA, 2, vci, vci_target); | ||
| MPIR_ERR_CHKANDSTMT(sreq == NULL, mpi_errno, MPIX_ERR_NOREQ, goto fn_fail, "**nomemreq"); | ||
| MPIDI_REQUEST_SET_LOCAL(sreq, MPIDI_rank_is_local(target_rank, win->comm_ptr), NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two separate comments:
- We should add
is_localtoMPIDIG_request_createto ensure request created in the ch4-layer always have locality set - What about requests created in the MPIR-layer or requests that straddles between shm/netmod such as collectives or anysource receive? We may need expand
MPIDI_REQUEST(rreq, is_local)toenum(or more like optional).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two separate comments:
- We should add
is_localtoMPIDIG_request_createto ensure request created in the ch4-layer always have locality set
Can do.
- What about requests created in the MPIR-layer or requests that straddles between shm/netmod such as collectives or anysource receive? We may need expand
MPIDI_REQUEST(rreq, is_local)toenum(or more like optional).
Actually this already happens for anysrc_partner via MPID_Request_create_hook. We can also initialize is_local=0 there and then rely on the shm layer to set it to true when appropriate. That would work since the optimization is to skip NM progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also initialize
is_local=0there and then rely on the shm layer to set it to true when appropriate. That would work since the optimization is to skip NM progress.
Yeah, we can do that and refactor later.
2c95be5 to
df64e86
Compare
|
test:mpich/ch4/most |
|
test:mpich/ch4/xpmem |
When global progress is triggered we should ignore the progress state flags and just call every progress function. We also need to trigger global progress in the single vci case because we plan to make the progress loop locality aware.
When a request is created, initialize is_local to false. Avoids accidentally reading an uninitialized value. Other layers will update the value to true where appropriate, such as the shared memory or generic active message code.
IPC protocol is local by definition, so make sure to reflect that in any requests created in IPC code.
Skip netmod progress when the request(s) being waited on are determined to be local only.
df64e86 to
67241fe
Compare
|
test:mpich/ch4/most |
Pull Request Description
Optimize progress for a batch of local communication requests by skipping netmod progress. Shows a 5-10% improvement in ch4/shm bandwidth measurements on a single node of Cascade Lake.
Author Checklist
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits are self-contained and do not do two things at once.
Commit message is of the form:
module: short descriptionCommit message explains what's in the commit.
Whitespace checker. Warnings test. Additional tests via comments.
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.