C++ Cross-Platform SSE / AVX Intrinsic‑Accelerated, Multi‑threaded & Inlined Memory Operations, Hashing, and Encryption
Header-only library providing highly optimized memory primitives (memset, memcpy, memmove), Data-Hashing algorithms, and Encryption utilities accelerated with x86 SIMD (SSE/AVX/AVX2). It supports Multi-threading via OpenMP and performs runtime CPU feature detection.
The example.cxx
file demonstrates example usage of the library.
The Library will Query the Host Hardware for CPU Intrinsics, Specifically SSE2 - 4.2, AVX, AVX2, and AES-NI. These Intrinsics will be Used if Available, if not the Algorithms shall fall-back to another Highly-Optimized Implementation (Excluding AES-128 - Which REQUIRES AES-NI Intrinsics for now)
✅ SIMD-Accelerated Memory Operations
- Block-level memory functions using SSE2, AVX, and AVX2 intrinsics.
- Runtime dispatch based on CPU capabilities and OS feature detection (CPUID + XGETBV).
- Clean fallback to scalar paths when SIMD is unavailable.
🔐 Data-Hashing
- Implements a high-performance CRC32C Implementation Utilizing Hardware Intrinsics for Acceleration, Safely Falls back to Software Variants if Intrinsics are not Detected on Host CPU
- Optimized for throughput in large data blocks.
🔏 AES-128 & HC-128 Encryption
- Intrinsic-accelerated routines for symmetric encryption primitives (AES-NI Intrinsic Accelerated AES-128 CTR-Mode Cipher, and an SSE2 128-bit Register-Optimized HC128 Implementation).
- Thread-safe, inlined for minimal call overhead.
⚙️ OpenMP Multi-threaded Engines
- Parallel processing using OpenMP.
- User-defined chunking enables efficient utilization of multi-core systems.
🧩 Utility & Compatibility Layer
- Cross-platform abstractions for compiler intrinsics (
__m128i
,__m256i
,_xgetbv
, etc.).
-
Due to the Mechanism of HC-128, you MUST Pass the same Key + IV EVERY TIME you Call the Cipher
-
AES-128 CTR Has no fallback if Intrinsic Instructions are Unavailable at the Moment, the Functions will simply return False in this Case
-
CRC32C Does not Support Multi-Threading unfortunately, due to the Design of the Cycle-Redundancy-Check Algorithm, as well as the HC-128 Encryption Cipher (AES, and General Memory Operations DO Support MT)
-
memmove Implementation was Excluded from this Project, as Multithreading is Incompatible with the Overlapping-Copy Mechanism it must Employ
-
By Default, accelmem only begins Utilizing OpenMP for Multithreading when dealing with ~100mb Data Allocations, this Threshhold may be Modulated by re-defining this Macro :
#define OMP_MEM_THR_THRESHHOLD <Threshhold_in_Bytes>