Branch | Tests | Coverage |
---|---|---|
master |
GroveDB: Hierarchical Authenticated Data Structure Database
GroveDB is a high-performance, cryptographically verifiable database system that implements a hierarchical authenticated data structure - organizing data as a "grove" where each tree in the forest is a Merkle AVL tree (Merk). This revolutionary approach solves the fundamental limitations of flat authenticated data structures by enabling efficient queries on any indexed field while maintaining cryptographic proofs throughout the hierarchy.
Built on cutting-edge research in hierarchical authenticated data structures, GroveDB provides the foundational storage layer for Dash Platform while being flexible enough for any application requiring trustless data verification.
- Key Features
- Architecture Overview
- Core Concepts
- Getting Started
- Usage Examples
- Query System
- Performance
- Documentation
- Contributing
- Organize data naturally in nested hierarchies
- Each subtree is a fully authenticated Merkle AVL tree
- Efficient navigation and organization of complex data
- Pre-computed secondary indices stored as subtrees
- O(log n) query performance on any indexed field
- No need to scan entire dataset for non-primary key queries
- Generate proofs for any query result
- Supports membership, non-membership, and range proofs
- Minimal proof sizes through optimized algorithms
- Layer-by-layer verification from root to leaves
- Built on RocksDB for reliable persistent storage
- Batch operations for atomic updates across multiple trees
- Intelligent caching system (MerkCache) for frequently accessed data
- Cost tracking for all operations
- 7 types of references for complex data relationships
- Automatic reference following (configurable hop limits)
- Cycle detection prevents infinite loops
- Cross-tree data linking without duplication
- Sum trees for automatic value totaling
- Count trees for element counting
- Combined count+sum trees
- Big sum trees for 256-bit integers
- Native Rust implementation
- Runs on x86, ARM (including Raspberry Pi), and WebAssembly
Traditional authenticated data structures face a fundamental limitation: they can only efficiently prove queries on a single index (typically the primary key). Secondary index queries require traversing the entire structure, resulting in large proofs and poor performance.
GroveDB's breakthrough is using a hierarchical authenticated data structure - a forest where each tree is a Merk (Merkle AVL tree). This architecture enables:
- Grove: The entire database - a forest of interconnected trees
- Trees: Individual Merk trees, each serving as either:
- Data Trees: Storing actual key-value pairs
- Index Trees: Storing references for secondary indices
- Aggregate Trees: Maintaining sums, counts, or other computations
- Root Hash: A single cryptographic commitment to the entire forest state
Each Merk tree maintains its own root hash, and parent trees store these hashes as values. This creates a hierarchy where:
- The topmost tree's root hash authenticates the entire database
- Each subtree can be independently verified
- Proofs can be generated for any path through the hierarchy
- Updates propagate upward, maintaining consistency
By pre-computing and storing secondary indices as separate trees:
- Query any index with O(log n) complexity
- Generate minimal proofs (only the path taken)
- Update indices atomically with data
- Maintain multiple views of the same data
GroveDB combines several innovative components:
┌────────────────────────────────────────────────────────┐
│ GroveDB Core │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Element │ │ Query │ │ Proof │ │
│ │ System │ │ Engine │ │ Generator │ │
│ └─────────────┘ └──────────────┘ └───────────────┘ │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Batch │ │ Reference │ │ Version │ │
│ │ Operations │ │ Resolver │ │ Management │ │
│ └─────────────┘ └──────────────┘ └───────────────┘ │
└────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────┐
│ Merk Layer │
│ (Merkle AVL Tree Implementation) │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ AVL Tree │ │ Proof │ │ Cost │ │
│ │ Balancing │ │ System │ │ Tracking │ │
│ └─────────────┘ └──────────────┘ └───────────────┘ │
└────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────┐
│ Storage Layer │
│ (RocksDB Abstraction) │
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Prefixed │ │ Transaction │ │ Batch │ │
│ │ Storage │ │ Support │ │ Processing │ │
│ └─────────────┘ └──────────────┘ └───────────────┘ │
└────────────────────────────────────────────────────────┘
- GroveDB Core: Orchestrates multiple Merk trees into a unified hierarchical database
- Merk: High-performance Merkle AVL tree implementation with proof generation
- Storage: Abstract storage layer with RocksDB backend, supporting transactions and batching
- Costs: Comprehensive resource tracking for all operations
- Version Management: Protocol versioning for smooth upgrades
At the heart of GroveDB's forest are Merk trees - highly optimized Merkle AVL trees that serve as the building blocks of the hierarchical structure:
- Self-Balancing: AVL algorithm ensures O(log n) operations
- Authenticated: Every node contains cryptographic hashes
- Efficient Proofs: Generate compact proofs for any query
- Rich Features: Built-in support for sums, counts, and aggregations
Each Merk tree in the grove can reference other Merk trees, creating a powerful hierarchical system where authentication flows from leaves to root.
GroveDB supports 8 element types:
// Basic storage
Element::Item(value, flags) // Arbitrary bytes
Element::Reference(path, max_hops) // Link to another element
Element::Tree(root_hash) // Subtree container
// Aggregation types
Element::SumItem(value) // Contributes to sum
Element::SumTree(root_hash, sum) // Maintains sum of descendants
Element::BigSumTree(root_hash, sum) // 256-bit sums
Element::CountTree(root_hash, count) // Element counting
Element::CountSumTree(root_hash, count, sum) // Combined
Data is organized using paths:
// Path: ["users", "alice", "documents"]
db.insert(
&["users", "alice"],
b"balance",
Element::new_item(b"100")
)?;
Seven reference types enable complex relationships:
AbsolutePathReference
: Direct path from rootUpstreamRootHeightReference
: Go up N levels, then follow pathUpstreamFromElementHeightReference
: Relative to current elementCousinReference
: Same level, different branchSiblingReference
: Same parent treeUtilityReference
: Special system references
- Rust 1.74+ (nightly)
- RocksDB dependencies
Add to your Cargo.toml
:
[dependencies]
grovedb = "3.0"
use grovedb::{GroveDb, Element};
use grovedb_version::version::GroveVersion;
// Open database
let db = GroveDb::open("./my_db")?;
let grove_version = GroveVersion::latest();
// Create a tree structure
db.insert(&[], b"users", Element::new_tree(None), None, None, grove_version)?;
db.insert(&[b"users"], b"alice", Element::new_tree(None), None, None, grove_version)?;
// Insert data
db.insert(
&[b"users", b"alice"],
b"age",
Element::new_item(b"30"),
None,
None,
grove_version
)?;
// Query data
let age = db.get(&[b"users", b"alice"], b"age", None, grove_version)?;
The following examples demonstrate how individual Merk trees combine to form a powerful hierarchical database.
🌲 Grove Root (Single Merk Tree)
├── 📂 users (Merk Tree)
│ ├── 👤 alice (Merk Tree)
│ │ ├── name: "Alice"
│ │ ├── age: 30
│ │ └── city: "Boston"
│ └── 👤 bob (Merk Tree)
│ ├── name: "Bob"
│ └── age: 25
├── 📊 indexes (Merk Tree)
│ ├── by_age (Merk Tree)
│ │ ├── 25 → Reference(/users/bob)
│ │ └── 30 → Reference(/users/alice)
│ └── by_city (Merk Tree)
│ └── Boston → Reference(/users/alice)
└── 💰 accounts (Sum Tree - Special Merk)
├── alice: 100 (contributes to sum)
└── bob: 200 (contributes to sum)
└── [Automatic sum: 300]
Each node marked as "Merk Tree" is an independent authenticated data structure with its own root hash, all linked together in the hierarchy.
// Create user data
db.insert(&[b"users"], b"user1", Element::new_tree(None), None, None, grove_version)?;
db.insert(&[b"users", b"user1"], b"age", Element::new_item(b"25"), None, None, grove_version)?;
db.insert(&[b"users", b"user1"], b"city", Element::new_item(b"Boston"), None, None, grove_version)?;
// Create indexes
db.insert(&[], b"indexes", Element::new_tree(None), None, None, grove_version)?;
db.insert(&[b"indexes"], b"by_age", Element::new_tree(None), None, None, grove_version)?;
db.insert(&[b"indexes"], b"by_city", Element::new_tree(None), None, None, grove_version)?;
// Add references in indexes
db.insert(
&[b"indexes", b"by_age"],
b"25_user1",
Element::new_reference(ReferencePathType::absolute_path(vec![
b"users".to_vec(),
b"user1".to_vec()
])),
None,
None,
grove_version
)?;
// Create account structure with balances
db.insert(&[], b"accounts", Element::new_sum_tree(None, 0), None, None, grove_version)?;
// Add accounts with balances
db.insert(&[b"accounts"], b"alice", Element::new_sum_item(100), None, None, grove_version)?;
db.insert(&[b"accounts"], b"bob", Element::new_sum_item(200), None, None, grove_version)?;
db.insert(&[b"accounts"], b"charlie", Element::new_sum_item(150), None, None, grove_version)?;
// Get total sum (automatically maintained)
let sum_tree = db.get(&[], b"accounts", None, grove_version)?;
// sum_tree now contains Element::SumTree with sum = 450
use grovedb::batch::GroveDbOp;
let ops = vec![
GroveDbOp::insert_op(vec![b"users"], b"alice", Element::new_tree(None)),
GroveDbOp::insert_op(vec![b"users", b"alice"], b"name", Element::new_item(b"Alice")),
GroveDbOp::insert_op(vec![b"users", b"alice"], b"age", Element::new_item(b"30")),
];
// Apply atomically
db.apply_batch(ops, None, None, grove_version)?;
use grovedb::query::PathQuery;
use grovedb_merk::proofs::Query;
// Create a path query
let path_query = PathQuery::new_unsized(
vec![b"users".to_vec()],
Query::new_range_full(),
);
// Generate proof
let proof = db.prove_query(&path_query, None, None, grove_version)?;
// Verify proof independently
let (root_hash, results) = GroveDb::verify_query(proof.as_slice(), &path_query, grove_version)?;
// Get all items in a subtree
let query = Query::new_range_full();
let path_query = PathQuery::new_unsized(vec![b"users".to_vec()], query);
let results = db.query(&path_query, false, false, None, grove_version)?;
// Get users with names from "A" to "M"
let mut query = Query::new();
query.insert_range(b"A".to_vec()..b"N".to_vec());
let path_query = PathQuery::new_unsized(vec![b"users".to_vec()], query);
let results = db.query(&path_query, false, false, None, grove_version)?;
// Get all users and their documents
let mut query = Query::new_with_subquery_key(b"documents".to_vec());
let path_query = PathQuery::new_unsized(vec![b"users".to_vec()], query);
let results = db.query(&path_query, false, false, None, grove_version)?;
GroveDB supports 10 query item types:
Key(key)
- Exact key matchRange(start..end)
- Exclusive rangeRangeInclusive(start..=end)
- Inclusive rangeRangeFull(..)
- All keysRangeFrom(start..)
- From key onwardsRangeTo(..end)
- Up to keyRangeToInclusive(..=end)
- Up to and including keyRangeAfter(prev..)
- After specific keyRangeAfterTo(prev..end)
- After key up to endRangeAfterToInclusive(prev..=end)
- After key up to and including end
Parent Tree Inclusion: When performing subqueries, you can include the parent tree element itself in the results:
let mut query = Query::new();
query.insert_key(b"users".to_vec());
query.set_subquery(Query::new_range_full());
query.add_parent_tree_on_subquery = true; // Include parent tree
let path_query = PathQuery::new_unsized(vec![], query);
let results = db.query(&path_query, false, false, None, grove_version)?;
// Results include both the "users" tree element AND its contents
This is particularly useful for count trees and sum trees where you want both the aggregate value and the individual elements.
The forest architecture delivers exceptional performance by leveraging the hierarchical nature of Merk trees:
- Primary Index: O(log n) - Direct path through single Merk tree
- Secondary Index: O(log n) - Pre-computed index trees eliminate full scans
- Proof Generation: O(log n) - Only nodes on the query path
- Proof Size: Minimal - Proportional to tree depth, not data size
Compare this to flat structures where secondary index queries require O(n) scans and generate O(n) sized proofs!
Performance on different hardware:
Hardware | Full Test Suite |
---|---|
Raspberry Pi 4 | 2m 58s |
AMD Ryzen 5 1600AF | 34s |
AMD Ryzen 5 3600 | 26s |
Apple M1 Pro | 19s |
- MerkCache: Keeps frequently accessed Merk trees in memory
- Batch Operations: Update multiple trees atomically in single transaction
- Cost Tracking: Fine-grained resource monitoring per tree operation
- Lazy Loading: Load only required nodes from Merk trees
- Prefix Iteration: Efficient traversal within subtrees
- Root Hash Propagation: Optimized upward hash updates through tree hierarchy
- Merk - Merkle AVL Tree
- Merk Deep Dive - Nodes, Proofs, and State
- Storage Abstraction Layer
- GroveDB Core
- Cost Tracking System
- Auxiliary Crates
See the examples directory for:
- Basic CRUD operations
- Secondary indexing patterns
- Reference usage
- Batch operations
- Proof generation and verification
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Clone repository
git clone https://github.com/dashevo/grovedb.git
cd grovedb
# Build
cargo build --release
# Run tests
cargo test
# Run benchmarks
cargo bench
GroveDB includes a web-based visualizer for debugging:
let db = Arc::new(GroveDb::open("db")?);
db.start_visualizer(10000); // Port 10000
// Visit http://localhost:10000 in your browser
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch
- Write tests for new functionality
- Ensure all tests pass
- Submit a pull request
# Run all tests
cargo test
# Run specific test
cargo test test_name
# Run with verbose output
cargo test -- --nocapture
GroveDB is licensed under the MIT License. See LICENSE for details.
GroveDB implements groundbreaking concepts from cryptographic database research:
- Database Outsourcing with Hierarchical Authenticated Data Structures - The seminal work by Etemad & Küpçü that introduced hierarchical authenticated data structures for efficient multi-index queries
- Merkle Trees - Ralph Merkle's foundational work on cryptographic hash trees
- AVL Trees - Adelson-Velsky and Landis's self-balancing binary search tree algorithm
GroveDB realizes the vision of hierarchical authenticated data structures by implementing a forest of Merkle AVL trees (Merk), where each tree can contain other trees. This solves the fundamental limitation of flat authenticated structures - enabling efficient queries on any index while maintaining cryptographic proofs throughout the hierarchy.
Special thanks to the Dash Core Group and all contributors who have helped make this theoretical concept a production reality.