Lace is a C framework for fine-grained fork-join parallelism on multi-core computers.
TASK_1(int, fibonacci, int, n) // macro to create Lace functions (can be in header file)
int fibonacci_CALL(lace_worker* lw, int n) {
if(n < 2) return n;
fibonacci_SPAWN(lw, n-1); // SPAWN a task (fork)
int a = fibonacci_CALL(lw, n-2); // run another task in parallel
int b = fibonacci_SYNC(lw); // SYNC the spawned task (join)
return a + b;
}
int main(int argc, char** argv)
{
int n_workers = 4; // create 4 workers
// use 0 to automatically use all available cores
int dqsize = 0; // use default task deque size
int stacksize = 0; // use default program stack size
lace_start(n_workers, dqsize, stacksize);
int result = fibonacci(42);
printf("fibonacci(42) = %d\n", result);
lace_stop();
}For more examples, see DOCS.md and the contents of the benchmarks folder.
- Features
- Installation
- Building
- Configuration Options
- Usage
- Benchmarking
- Academic publications
- License
- ⚡ Low-overhead, lock-free work-stealing
- 💤 Suspend and resume workers to save CPU time
- 🕒 Exponential backoff to save CPU time
- 📌 Optional thread pinning with
hwloc - 📈 Low-overhead statistics per worker
- ⛔ Interrupt support (e.g. for garbage collection, or initialisation)
Lace uses a scalable double-ended queue for its implementation of work-stealing, which is wait-free for the thread spawning tasks and lock-free for the threads stealing tasks. The design of the datastructure minimizes interaction between CPUs. Lace can report the number of tasks, steals and queue splits per Lace worker. It can also report the amount of time spent in startup/shutdown, performing stolen work, overhead of stealing and of searching for work, per worker. Gathering these statistics is done with virtually no overhead.
Please let us know if you need features that are currently not implemented in Lace.
Lace requires a C11-compatible compiler (tested with GCC and Clang) and optionally hwloc (libhwloc-dev).
Lace works on:
- 🐧 Linux
- 🪟 Windows (via MSYS2)
- 🍎 macOS
You can install Lace via make install, or integrate it into your project via CMake:
Example for CMake with FetchContent
if(NOT TARGET lace)
find_package(lace QUIET)
if(NOT lace_FOUND)
include(FetchContent)
FetchContent_Declare(
lace
GIT_REPOSITORY https://github.com/trolando/lace.git
GIT_TAG v2.0.0
)
FetchContent_MakeAvailable(lace)
endif()
endif()This example first tests if Lace is already a target in the project, for example when included as a submodule. Otherwise it tries to find an installed version, or fetch it from GitHub.
Create a separate build directory:
cmake -B build
cmake --build buildLace can be configured with the following CMake options:
| Setting | Description |
|---|---|
LACE_BUILD_TESTS |
Build the testing programs (not when subproject) |
LACE_BUILD_BENCHMARKS |
Build the included set of benchmark programs (not when subproject) |
LACE_USE_MMAP |
Use mmap to allocate memory instead of aligned_alloc |
LACE_USE_HWLOC |
Use the hwloc library to pin threads to CPUs |
LACE_COUNT_TASKS |
Let Lace record the number of executed tasks |
LACE_COUNT_STEALS |
Let Lace count how often tasks were stolen |
LACE_COUNT_SPLITS |
Let Lace count how often the queue split point was moved |
LACE_PIE_TIMES |
Let Lace record precise overhead times |
LACE_BACKOFF |
Let Lace workers sleep when there is no work to steal |
Recommendations:
- Use
LACE_USE_MMAPto reduce physical memory usage. Memory is allocated lazily by the OS. - Use
LACE_USE_HWLOCto ensure threads are pinned to CPU cores appropriately. - Leave
LACE_BACKOFFon as benchmarks show that this does not affect performance.
See the documentation for more details.
Lace includes a set of benchmark programs to evaluate its performance. Many of these benchmarks are adapted from well-known frameworks such as Cilk, Wool, and Nowa.
To enable the benchmarks, build Lace with:
cmake -B build -DLACE_BUILD_BENCHMARKS=ON
cmake --build buildThe compiled benchmarks will be placed in build/benchmarks/, along with the
bench.py script for running them.
The following two academic publications are directly related to Lace.
T. van Dijk (2016) Sylvan: Multi-core Decision Diagrams. PhD Thesis.
T. van Dijk and J.C. van de Pol (2014) Lace: Non-blocking Split Deque for Work-Stealing. In: Euro-Par 2014: Parallel Processing Workshops. LNCS 8806, Springer.
Lace is licensed with the Apache 2.0 license.