Skip to content

Conversation

@wfaderhold21
Copy link
Collaborator

What

The current onesided alltoall algorithm in TL/UCP can suffer from endpoint congestion in particular situations (e.g., high PPN counts in jobs). This algorithm adds a congestion avoidant algorithm allowing users to supply bandwidth in MB/s and a percentage of bandwidth to use for a particular alltoall collective.

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

@wfaderhold21
Copy link
Collaborator Author

from WG comments:

  • Merge code with PR TL/UCP: congestion avoidant onesided a2a #1096: add into original onesided alltoall algorithm instead of a new algorithm
  • Remove ONESIDED_CA_RATE environment variable, obtain through UCP
  • Remove ONESIDED_CA_PPN environment variable, obtain through topology

@wfaderhold21
Copy link
Collaborator Author

combined with #1096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants