The repository contains our implementations of four approximate filters: the Bloom filter [1], the Cuckoo filter [2], the Morton filter [3], and the Xor filter [4]. We used the code in our paper A four-dimensional Analysis of Partitioned Filters.
In addition to our optimized filter implementations, the repository also contains the code of state-of-the-art competitors we compare to and extensive test cases. We generate the benchmarks using python scripts and included our results on an Intel i9-9900x (Skylake-X) with 64 GiB memory.
- C++20 (we used GCC 10.2)
 - CMake version 3.10 & make
 - Linux (we used Ubuntu 20.10 with kernel version 5.8)
 - Python3 + virtualenv (python is only needed for generating the benchmarks and the plots)
 
make test executes all available test (~400 test cases that test the correctness of our implementation (no false-negatives) and different aspects like multi-threading, partitioning, vectorization, etc)
make benchmark generates the benchmark and the user can select the benchmark to execute. The results are written to csv-files in the benchmark/paper folder. All measurements are repeated five times to get stable results.
Executing all benchmarks takes roughly 1 week and requires 64 GiB memory. Some of the benchmarks do measure only the false-positive rate and the failed builds and, thus, should be executed with all available threads.
The csv includes the following fields:
| Field | Unit | Description | 
|---|---|---|
| name | Text | Configuration: <BenchmarkName>_<k>/  <Fixture>/  <s> / <n_threads> / <n_partitions> / <elements_build> / <elements_lookup> / <shared_elements> / _ / _ | 
| real_time | milliseconds | execution time per iteration | 
| DTLB-misses | float | data translation lookaside buffer misses per iteration | 
| ITLB-misses | float | instruction translation lookaside buffer misses per iteration | 
| L1D-misses | float | level 1 data cache misses per iteration | 
| L1I-misses | float | level 1 instruction cache misses per iteration | 
| LLC-misses | float | last-level (L3) cache misses per iteration | 
| branch-misses | float | branch misses per iteration | 
| cycles | float | cycles per iteration | 
| instruction | float | executed instructions per iteration | 
| FPR | float | false-positive rate of the filter (only available when lookup is benchmarked) | 
| failures | integer | number of failed builds | 
| retries | integer | number of retries needed to build the filter | 
| bits | float | number of bits per key allocated to the filter | 
| size | integer | size of the filter in bytes | 
make plot generates all plots used in the paper. The benchmark data is taken from benchmark/paper.
benchmark: code for benchmarking the filter and the definition files with our results on the Skylake-X machine.cmake: optional packages.lib: external dependencies and existing filter implementations. The code in this folder is not licensed under the MIT License (see Dependencies).python: scripts for generating, executing and plotting benchmarks.src: filter implementations.test: extensive test cases for our filter implementations and the integration the competitors.vendor: external packages.
lib/amd_mortonfilter: original Morton Filter implementation used in [3], licensed under the MIT License.lib/bsd: (register-)blocked and (cache-)sectorized Bloom Filter implementations with SIMD support and external competitors used in [1], licensed under the Apache License (Version 2.0), the 2-clause BSD License, and the 3-clause BSD License.lib/cityhash: Google's CityHash implementation, licensed under the MIT License.lib/efficient_cuckoofilter: original Cuckoo Filter implementation used in [2], licensed under the Apache License (Version 2.0).lib/fastfilter: original Xor Filter implementation used in [4] licensed under the Apache License (Version 2.0).lib/impala: original sectorized Bloom Filter used in the Impala, licensed under the Apache License (Version 2.0).lib/libdivide: the LibDivide library computes magic numbers for optimizing integer divisions, licensed under the zlib License.lib/perfevent: library for reading perf counters in C(++), licensed under the MIT License.lib/vacuumfilter: Vacuum Filter implementation.
[1] Performance-Optimal Filtering: Bloom Overtakes Cuckoo at High Throughput
[2] Cuckoo Filter: Practically Better Than Bloom
[4] Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters