A fast and memory-efficient implementation of the HyperLogLog algorithm in .NET for approximate cardinality estimation of large data sets.
- ⚡ Efficient cardinality estimation with HyperLogLog algorithm
- 🧠 Built-in support for multiple types: string, int, Guid, and more via custom hashers
- 🔧 Easy to extend with your own IHasher implementations for any data type
- 🧪 Optional multiple-run estimation for improved accuracy and reduced variance
- 🧱 Configurable precision (4–16) with built-in validation
- 💼 Fully compatible with .NET Standard for broad platform support
- Precision 10: < 3% average error up to 100,000 unique items
- Precision 12: < 1.5% average error up to 100,000 items
- Precision 14: < 0.7% average error at 100,000 cardinality
Handles duplicate values correctly, estimating only unique cardinality. Example:
- Input: 10,000 identical values
- Estimated: ~10,000 (error: <1%)
dotnet add package HLL.NET
using HLL.NET;
var items = new List<string>();
for (int i = 0; i < 10_000; i++)
items.Add($"user_{i}");
double estimate = HyperLogLog.EstimateWithMultipleRuns(items, runs: 5, precision: 14);
Console.WriteLine($"Estimated unique count: {estimate:F2}");
For more fine-grained control:
var hll = new HyperLogLog<string>(precision: 14);
hll.Add("apple");
hll.Add("banana");
hll.Add("apple");
double estimate = hll.Estimate();
Console.WriteLine($"Estimated: {estimate:F2}");
dotnet test
Tests are located in the tests/HLL.NET.Tests/
project and cover various edge cases and expected behaviors.
PRs and suggestions welcome! Please:
- Fork the repo
- Create a feature branch
- Add tests if needed
- Submit a PR 🚀
MIT © 2025 — MCUnderground