This repository provides multiple parallel n-body simulation algorithms, implemented in portable ISO C++ that runs on multi-core CPUs and GPUs:
- All-Pairs;
$O(N^2)$ time complexity:- Classic
all-pair, parallelized over bodies. -
all-pairs-collapsed, parallelized over force pairs.
- Classic
- Barnes-Hut ;
$O(N \log N)$ time complexity:- Starvation-free
octreealgorithm: requires parallel forward progress. - Hilbert-sorted Bounding Volume Hierarchy (
bvh) algorithm: requires weakly parallel forward progress.
- Starvation-free
Pre-requisites: docker and HPCCM:
$ pip install hpccmRun samples as follows:
# Options
# ./ci/run_docker <toolchain> <algorithm> <workload case> <dim> <precision> <bodies> <steps>
# Example: nvc++ gpu compiler, octree algorithm, galaxy simulation, 3D, double precision:
$ ./ci/run_docker nvgpu octree galaxy 3 double
# Build, but not run:
$ BUILD_ONLY=1 ./ci/run_docker nvgpu octree galaxy 3 double
# Run assuming binary is built:
$ RUN_ONLY=1 ./ci/run_docker nvgpu octree galaxy 3 doubleTo reproduce without a container, a properly set up environment is required, in which case the ./ci/run script can be used instead.
Following options available:
- Toolchain:
- Open-source vendor-neutral:
acpp(AdaptiveCpp),gcc(Intel TBB),clang(Intel TBB), - Vendor-specific:
- AMD ROCm stdpar:
amdclang. - NVIDIA HPC SDK:
nvgpu(nvc++ -stdpar=gpu),nvcpu(nvc++ -stdpar=cpu). - Intel oneAPI:
dpc++
- AMD ROCm stdpar:
- Open-source vendor-neutral:
- Algorithm:
all-pairs,all-pairs-collapsed,octree,bvh. - Dimensions:
2(2D),3(3D). - Precision:
float,double. - Workloads:
galaxynasa: loads data-set from file, requires using./ci/run_docker thuering fetchfor set up.
To run all benchmarks on a given systems, you can use ./ci/run_docker bench.
MIT License, see LICENSE.
Thomas Lane Cassell, Tom Deakin, Aksel Alpay, Vincent Heuveline, and Gonzalo Brito Gadeschi. "Efficient Tree-Based Parallel Algorithms for N-Body Simulations Using C++ Standard Parallelism." In Workshop on Irregular Applications: Architectures and Algorithms Held in Conjunction with Supercomputing (P3HPC). IEEE, 2024.
When contributing code, you may format your contributions as follows:
$ ./ci/run_docker fmtbut doing this is not required.
The environment is made portable through mamba/conda.
This must be installed as a prerequisite, e.g., run the Miniforge installer from https://github.com/conda-forge/miniforge .
Then create the stdpar-nbody environment:
$ mamba env create -f environment.yamlOther things you might want:
- NVIDIA HPC SDK
Use make to build the program.
This must be done within the mamba environment:
$ mamba activate stdpar-bhThe number of dimensions can be specified with D=<dim> parameter to make.
By default D=2 is used.
These are the available targets:
CPU
make gccmake clangmake nvcpp
GPU
make gputo build for NVIDIA GPUs usingnvc++
The output will be ./nbody_d<dim>_<target>.
When running the nvcpp version, it is recommended to use the following environment variables:
OMP_PLACES=cores OMP_PROC_BIND=close ./nbody_d2_nvcpp -s 5 -n 1000000If you get an error about missing libraries then try running with the following environment variable:
LD_LIBRARY_PATH=${CONDA_PREFIX}/lib ./nbody_d2_clang -s 5 -n 1000000Run Barnes-Hut with
$ ./nbody_d2_gpu -s 5 -n 10 --print-state --theta 0
$ ./nbody_d2_gpu -s 5 -n 10 --print-state --algorithm all-pairsRun a large Barnes-Hut simulation with 1,000,000 bodies:
$ ./nbody_d2_gpu -s 5 -n 1000000Generate a similar image to the above GIF:
$ ./nbody_d2_gpu -s 1000 -n 10000 --save pos --workload galaxy
$ python3 scripts/plotter.py pos --galaxy --gifTo find other program arguments:
$ ./nbody_d2_gpu --help