-
Notifications
You must be signed in to change notification settings - Fork 243
Description
Whilst running a workload which uses boost::math::digamma, I discovered unexpectedly slow performance on aarch64 platforms.
The default seen in the distros (Ubuntu 22.04, Rocky9) on this platform (Linux aarch64) is to build with 128b floats. This then causes software emulation steps - and its large slowdown.
This seems a poor default - and whilst you can resolve at application compile time, by passing the compiler flag -DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS
- it's unlikely people will know to do so.
On aarch64 a 100x speed up is had by adding the define compared to the current default. On x86, it brings around a 6x speed up.
g++ workload-datasets/boost/btest.cpp -Ofast -mcpu=native
g++ workload-datasets/boost/btest.cpp -Ofast -mcpu=native -DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS
#include <boost/math/special_functions/digamma.hpp>
#include <iostream>
#include <chrono>
#include <cstdlib>
#define N 1000 * 1000 * 100
void long_operation() {
double d = 0;
for (int i = 1; i < N ; ++i)
d += boost::math::digamma((double) i);
std::cout << d << std::endl;
}
int main(int argc, char *argv[]) {
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::milliseconds;
auto t1 = high_resolution_clock::now();
int reps;
if (argc > 1)
reps = std::atoi(argv[1]);
else reps = 1;
for (int r = 0; r < reps; ++r)
long_operation();
auto t2 = high_resolution_clock::now();
/* Getting number of milliseconds as an integer. */
auto ms_int = duration_cast<milliseconds>(t2 - t1);
std::cout << ms_int.count() << "ms\n";
return 0;
}