-
Notifications
You must be signed in to change notification settings - Fork 6.6k
SST Partitioner
SstPartitioner is an interface in RocksDB that allows users to control how SST files are partitioned during compactions. Partitioning SST files can help compactions split files at meaningful boundaries (such as key prefixes), which reduces write amplification by limiting the propagation of SST files across the entire key space.
To use SstPartitioner, applications need to implement the SstPartitioner
and SstPartitionerFactory
interfaces found in rocksdb/sst_partitioner.h
, and set sst_partitioner_factory
in ColumnFamilyOptions
. Similar to CompactionFilterFactory
, SstPartitionerFactory
creates an SstPartitioner
for each compaction by calling SstPartitionerFactory::CreatePartitioner
. The ShouldPartition()
method will then be called for all keys during the compaction process.
struct PartitionerRequest {
PartitionerRequest(const Slice& prev_user_key_,
const Slice& current_user_key_,
uint64_t current_output_file_size_)
: prev_user_key(&prev_user_key_),
current_user_key(¤t_user_key_),
current_output_file_size(current_output_file_size_) {}
const Slice* prev_user_key;
const Slice* current_user_key;
uint64_t current_output_file_size;
};
enum PartitionerResult : char {
// Partitioner does not require to create new file
kNotRequired = 0x0,
// Partitioner is requesting forcefully to create new file
kRequired = 0x1
// Additional constants can be added
};
// It is called for all keys in compaction. When partitioner want to create
// new SST file it needs to return true. It means compaction job will finish
// current SST file where last key is "prev_user_key" parameter and start new
// SST file where first key is "current_user_key". Returns decision if
// partition boundary was detected and compaction should create new file.
virtual PartitionerResult ShouldPartition(const PartitionerRequest& request) = 0;
// Called with smallest and largest keys in SST file when compaction try to do
// trivial move. Returns true is partitioner allows to do trivial move.
virtual bool CanDoTrivialMove(const Slice& smallest_user_key,
const Slice& largest_user_key) = 0;
This feature has been available since 6.12.0 (https://github.com/facebook/rocksdb/pull/6957).
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- SST Partitioner
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc