- 
                Notifications
    
You must be signed in to change notification settings  - Fork 300
 
Notes on SIMD programming
        s-trinh edited this page May 21, 2025 
        ·
        3 revisions
      
    To enable AVX2: -DCMAKE_CXX_FLAGS="-mavx -mavx2 -mfma"
To determine if a shared library has been built with some vectorization instructions set, see:
- 
objdump -d [...]/visp-build/install/lib/libvisp_core.so | grep %ymm0(zmm0for AVX512 registers) 
Current state on intrinsics code in ViSP:
- only x86 SSE (no AVX, AVX2, ARM NEON, ...)
 - SSE headers must be included in 
.cppfile to detect if the compiler support the generation of corresponding intrinsics code at compilation time: 
#if defined __SSE2__ || defined _M_X64 || (defined _M_IX86_FP && _M_IX86_FP >= 2)
#include <emmintrin.h>
#define VISP_HAVE_SSE2 1
#if defined __SSE3__ || (defined _MSC_VER && _MSC_VER >= 1500)
#include <pmmintrin.h>
#define VISP_HAVE_SSE3 1
#endif
#if defined __SSSE3__ || (defined _MSC_VER && _MSC_VER >= 1500)
#include <tmmintrin.h>
#define VISP_HAVE_SSSE3 1
#endif
#endif
- use CMake options to enable SSE2 / SSE3 / SSSE3, this will add the necessary flags (e.g. 
-msse2) - use 
vpCPUFeatures::checkSSE2()to check if the CPU support SSE2 instructions set at run time - this is necessary to avoid issue when for example ViSP is built with SSSE3 support but is run on a computer that does not support SSSE3
 
AVX2 has been added since Haswell architecture (2013). Correct way to support AVX2, AVX512, ... would be:
- SSE and AVX2 code must be separated into separate compilation units
 - source files that contain SSE code will be compiled with only SSE flags (e.g. 
msse2) and source files that contain AVX2 code with AVX2 flag (e.g.-mavx2or/arch:AVX2for MSVC), see CPU dispatcher topics - when packaging ViSP for Linux distributions, the best is to have (see also):
- one option to enable baseline intrinsics (e.g. SSE2 or SSE3), regular and files that contain SSE code will have the SSE flags added
 - one option to add dispatched intrinsics (e.g. AVX2, AVX512, ...), source files that contain AVX2 code will have the 
-mavx2flag added - this way, we assume that we target at minimum SSE2 or SSE3 cpus, source files with no intrinsics code will also be compiled with 
-msse2or-msse3flags (so the compiler may be able to generate SSE code even if no SSE intrinsics code are written, see for instance this example with-03or-march=nativecompiler flags) - users with recent cpu will be able to benefit from code written with AVX2 intrinsics
 
 - some warnings with SSE-AVX transition penalty
 
Some additional references: