-
Notifications
You must be signed in to change notification settings - Fork 2
Meeting 22.11
roy-sc edited this page Nov 22, 2021
·
12 revisions
- allreduce
- allreduce_rabenseifner
- allreduce_native_ring
- allreduce_native_basic_linear
- allreduce_native_rabenseifner
- allreduce_native_nonoverlapping
- allreduce_native_recursive_doubling
- allreduce_native_segmented_ring
- allgather
- allgather-async
- to be added: native variations
- Know your topology and optimize based on that
- How do other HPC applications handle that?
- Refine cost model
- implement asynchronous versions of promising versions
- generalized rabenseifner
- hierarchical computing?
- Optimize Outer Product (Eigen? BLAS? Manual implementation [Vectorization, Loop unrolling, cache thoughts])
- Does
MPI_ALLREDUCE
compute while receiving? (allreduce-ring
is close toallreduce
) - Adjust vector sizes (1000-8000? 500 increments?)
- What are expected {vector sizes, node number}?
- Only powers of 2 okay?
- What did Saleh mean with asynchronous computation? Sending chunks and operate on them (pipelined)?