This is an implementation of accelerating gradient aggregation over Ring-AR by using eBPF.
This repository mainly contains two parts:
- Implementation of eBPF code in eRAR is in
erar-kernel. We useearto attach the eBPF code to kernel. mpich-eraris a modified version of MPICH where we have integrated eRAR.
Run build.sh to complie the code in erar-kernel and mpich-erar.
We need to modify erar.conf to configure the cluster settings. The erar.conf follows the following format:
# rank id, ip addr, mac addr, num of nic, name of nic
0, 33.33.33.120, b4:05:5d:ac:85:f3, 1, ens27f3
Then we can use ./erar-kernel/erar -r <rank-id> to attach eRAR to kernel.
We have integrated eRAR into MPICH, and by setting environment variables, we can run eRAR directly following the manner of MPI.
mpirun -env MPI_EBPF_ALLREDUCE 1 --hostfile <hostfile> -np <num-of-nodes> <execute-file>