Implement pthread parallelism #524
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This implements a simple version of oneapi::tbb::parallel_invoke using plain pthreads.
The pthread implementation is slightly less performant, but it may be better suited for light-weight embedded systems than TBB and C++.
Before bifurcating execution, a thread is allocated from a list of free threads or it is created if the total number of threads is less than the number of cores on the system.
If there is no thread available, both the left and right subtrees are processed.
If a thread could be allocated, the thread is given an execution context for the left subtree and instructed to start execution via a barrier.
The current thread continues to process the right subtree and synchronises again with the allocated thread via a barrier and returns it to the list of free threads.
Simple Benchmarks
I tested this implementation using a b3sum implementation of my own which I've included below. I ran it against a 1GB large file in tmpfs. By the way, would you be interested in a PR to include this program?
With TBB:
With pthread
b3sum.c