Skip to content

Meeting 22.11

roy-sc edited this page Nov 22, 2021 · 12 revisions

Algorithm Selection

Allreduce flavors

  • allreduce
  • allreduce_rabenseifner
  • allreduce_native_ring
  • allreduce_native_basic_linear
  • allreduce_native_rabenseifner
  • allreduce_native_nonoverlapping
  • allreduce_native_recursive_doubling
  • allreduce_native_segmented_ring

Allgather flavors

  • allgather
  • allgather-async
  • to be added: native variations

New Ideas

  • Know your topology and optimize based on that
  • How do other HPC applications handle that?
  • Refine cost model
  • implement asynchronous versions of promising versions
  • generalized rabenseifner
  • hierarchical computing?
  • Optimize Outer Product (Eigen? BLAS? Manual implementation [Vectorization, Loop unrolling, cache thoughts])

Todo / Questions

  • Does MPI_ALLREDUCE compute while receiving? (allreduce-ring is close to allreduce)
  • Adjust vector sizes (1000-8000? 500 increments?)
  • What are expected {vector sizes, node number}?
  • Only powers of 2 okay?
  • What did Saleh mean with asynchronous computation? Sending chunks and operate on them (pipelined)?
Clone this wiki locally