djafs (DeeJay-fs) is a high-performance FUSE-based filesystem that provides compressed, content-addressable storage for JSON files with time-travel capabilities.
- Overview
- The Problem
- Solution Architecture
- FUSE Technology
- System Design
- File Formats
- Usage Examples
- Implementation Status
- Development Roadmap
- Technical References
djafs solves the problem of efficiently storing and accessing large volumes of compressible JSON data while maintaining filesystem semantics and providing advanced features like point-in-time snapshots.
- Transparent Compression: JSON files are automatically compressed without changing application interfaces
- Content-Addressable Storage: Eliminates data duplication using SHA-256 hashing
- Time-Travel Snapshots: View filesystem state at any point in time
- High Performance: Optimized for both read and write operations
- Backup-Friendly: Non-opaque storage format allows manual recovery
- FUSE-Based: Standard filesystem interface compatible with all applications
- Time-Series Data: IoT sensor readings, metrics, logs
- Event Sourcing: Application events and state changes
- Archive Storage: Long-term retention of structured data
- Data Lakes: Structured data storage with efficient compression
Traditional approaches to storing JSON time-series data face several challenges:
Current Structure:
archive/
├── 2024/
│ ├── 01/
│ │ ├── 01/
│ │ │ ├── sensor_001_1704067200.json (12KB)
│ │ │ ├── sensor_001_1704067260.json (12KB)
│ │ │ └── sensor_001_1704067320.json (12KB)
│ │ └── 02/
│ └── 02/
└── 2023/
Problems:
- Storage Inefficiency: JSON files are highly compressible but stored uncompressed
- Inode Exhaustion: Millions of small files can exhaust filesystem inodes
- Backup Overhead: Many small files slow down backup operations
- No Deduplication: Identical or similar content is stored multiple times
- Limited Snapshots: No easy way to view historical filesystem states
djafs transforms the storage model while maintaining the same access patterns:
FUSE Interface (what applications see):
/mnt/djafs/
├── live/ <- Current active data
│ ├── 2024/01/01/
│ │ ├── sensor_001_1704067200.json
│ │ ├── sensor_001_1704067260.json
│ │ └── sensor_001_1704067320.json
│ └── 2024/01/02/
└── snapshots/ <- Time-travel interface
├── latest/
├── 2024/
│ ├── 01/
│ │ ├── 01/
│ │ └── 02/
│ └── 02/
└── 2025/
Backend Storage (actual disk layout):
/data/djafs/
├── hot_cache/ <- Write buffer
├── archive_2024_01.djfz <- Compressed archives
├── archive_2024_02.djfz
└── workdir/ <- Content-addressable storage
├── a1/
│ └── a1b2c3...def.json <- Hashed files
└── b2/
└── b2c3d4...abc.json
FUSE (Filesystem in Userspace) is a software interface that allows non-privileged users to create their own file systems without editing kernel code. It works by:
- Kernel Module: A thin kernel module that receives filesystem calls
- User Space Daemon: Your custom filesystem implementation
- Protocol Bridge: Communication between kernel and userspace via
/dev/fuse
When a user runs cat /mnt/djafs/live/2024/01/01/sensor_001.json
:
- Kernel receives read() syscall
- FUSE kernel module forwards to djafs daemon
- djafs daemon: a. Looks up file in lookup table b. Finds hash: a1b2c3...def c. Decompresses archive containing the file d. Returns content to kernel
- Kernel returns data to application
djafs uses the bazil.org/fuse library, a pure Go implementation of the FUSE protocol that doesn't rely on the C FUSE library.
Key Components:
- fs.FS: Root filesystem interface
- fs.Node: Represents files and directories
- fs.Handle: Represents opened files
- Lookup/Read/Write: Core filesystem operations
djafs architecture consists of four interconnected data storage systems:
The user-facing filesystem that maintains familiar directory structures:
/live/
: Current active data with standard hierarchy/snapshots/
: Time-based views generated on-demand- Virtual Directories: Dynamically created based on lookup tables
- Standard Operations: Full support for read, write, stat, readdir
Files are stored by their SHA-256 hash to eliminate duplication:
workdir/
├── a1/
│ ├── a1b2c3d4e5f6789abcdef012345.json <- Original: sensor_001_1704067200.json
│ └── a1f7e8d9c2b3a4f5e6d7c8b9a0f.json <- Original: sensor_002_1704067200.json
└── b2/
└── b2c3d4e5f6789abcdef012345a1b.json <- Original: sensor_001_1704067260.json
Benefits:
- Automatic Deduplication: Identical files stored only once
- Integrity Checking: Hash verification prevents corruption
- Efficient Storage: Only unique content consumes space
Related files are grouped into compressed archives for optimal storage efficiency:
archive_2024_01_week_1.djfz <- ZIP archive containing:
├── lookups.djfl <- JSON lookup table
├── metadata.djfm <- Archive metadata
├── a1b2c3d4e5f6789abcdef012345.json
├── a1f7e8d9c2b3a4f5e6d7c8b9a0f.json
└── b2c3d4e5f6789abcdef012345a1b.json
Compression Strategy:
- Time-Based Grouping: Files from similar time periods compress better
- Configurable Periods: Weekly, monthly, or custom grouping
- Standard ZIP Format: No proprietary formats for maximum recoverability
A write-through cache that optimizes write performance:
hot_cache/
├── incoming/ <- New files land here first
│ ├── sensor_001_1704067380.json
│ └── sensor_002_1704067380.json
└── staging/ <- Files being processed by GC
Write Flow:
- New file written to
hot_cache/incoming/
- Write completes immediately (fast response)
- Background garbage collector:
- Computes SHA-256 hash
- Moves to content-addressable storage
- Updates lookup tables
- Adds to compressed archive
- Removes from hot cache
Lookup tables map human-readable filenames to content-addressable hashes:
{
"entries": [
{
"name": "2024/01/01/sensor_001_1704067200.json",
"target": "a1b2c3d4e5f6789abcdef012345.json",
"size": 12484,
"modified": "2024-01-01T12:00:00Z",
"inode": 100001
},
{
"name": "2024/01/01/sensor_001_1704067260.json",
"target": "b2c3d4e5f6789abcdef012345a1b.json",
"size": 12490,
"modified": "2024-01-01T12:01:00Z",
"inode": 100002
}
],
"sorted": true
}
Snapshot Functionality:
- Lookup tables are append-only logs
- To view snapshots, read entries up to specific timestamp
- Deleted files have empty
target
field - Modified files create new entries without deleting old content
One of the most elegant aspects of djafs is how it resolves which zip archive contains a specific file without requiring a master index. The backing filesystem directory structure itself serves as the index.
When looking for a file like /sensors/location1/device5/reading.json
:
- Walk down the backing filesystem:
.data/sensors/location1/device5/
- Hit a "dead end": The directory doesn't exist (because it was a zip boundary)
- Back up one level:
.data/sensors/location1/
exists - Check the sibling lookup table:
.data/sensors/location1/lookups.djfl
- Find the file entry: The lookup table contains
device5/reading.json
Original files:
/sensors/location1/device5/reading.json
/sensors/location1/device5/config.json
/sensors/location1/device6/reading.json
/sensors/location1/summary.json
After zip boundary determination:
.data/
└── sensors/location1/
├── lookups.djfl <- Contains: device5/reading.json, device5/config.json,
│ device6/reading.json, summary.json
└── files.djfz <- Compressed archive
File lookup for /sensors/location1/device5/reading.json
:
- Try to access
.data/sensors/location1/device5/
→ Dead end! - Back up to
.data/sensors/location1/
→ Exists! - Open
.data/sensors/location1/lookups.djfl
- Search for entry with
name: "device5/reading.json"
- Extract from
files.djfz
using the target hash
- Self-Indexing: The filesystem structure eliminates the need for separate index files
- O(path-depth) Lookup: Maximum directory traversals equal to path depth
- No Master Index: Each boundary is self-contained with its own lookup table
- Intuitive: The "dead end" naturally points to the exact lookup table containing your file
This approach scales efficiently even with thousands of zip boundaries across a deep directory tree.
Each archive includes metadata for performance optimization:
{
"djafs_version": "1.0.0",
"compressed_size": 2457600,
"uncompressed_size": 8392704,
"total_file_count": 1440,
"target_file_count": 1200,
"oldest_file_ts": "2024-01-01T00:00:00Z",
"newest_file_ts": "2024-01-07T23:59:59Z"
}
.djfz
: Compressed archive files (ZIP format).djfl
: JSON lookup table files.djfm
: JSON metadata files
Each .djfz
file contains:
archive_2024_01_week_1.djfz
├── lookups.djfl <- Lookup table for this archive
├── metadata.djfm <- Archive metadata
├── <hash1>.json <- Content-addressable files
├── <hash2>.json
└── <hashN>.json
- Go 1.24.4 or later
- FUSE support on your system:
- Linux:
sudo apt-get install fuse
orsudo yum install fuse
- macOS: Install FUSE for macOS
- FreeBSD: FUSE is included in base system
- Linux:
# Clone the repository
git clone https://github.com/your-org/dendra-fuse-djafs
cd dendra-fuse-djafs
# Build the filesystem
go build -o djafs .
# Create a mount point
mkdir /tmp/djafs-mount
# Mount the filesystem
./djafs /tmp/djafs-mount
# In another terminal, use the filesystem
echo '{"temperature": 23.5, "timestamp": "2024-01-01T12:00:00Z"}' > /tmp/djafs-mount/live/2024/01/01/sensor.json
cat /tmp/djafs-mount/live/2024/01/01/sensor.json
# Unmount when done
fusermount -u /tmp/djafs-mount # Linux
umount /tmp/djafs-mount # macOS/FreeBSD
# Mount the filesystem
./djafs /mnt/djafs
# Write a file (goes to hot cache)
echo '{"sensor_id": "001", "value": 23.5}' > /mnt/djafs/live/2024/01/01/reading.json
# Read the file (transparent decompression)
cat /mnt/djafs/live/2024/01/01/reading.json
# List current files
ls -la /mnt/djafs/live/2024/01/01/
# View snapshots
ls /mnt/djafs/snapshots/ # Shows years: 2024, 2025, latest
ls /mnt/djafs/snapshots/2024/ # Shows months: 01, 02, 03, ...
ls /mnt/djafs/snapshots/2024/01/ # Shows days: 01, 02, 03, ...
ls /mnt/djafs/snapshots/2024/01/01/2024/01/01/ # Shows files from that day
# Browse snapshots hierarchically
ls /mnt/djafs/snapshots/ # Shows: latest, 2024, 2025, ...
ls /mnt/djafs/snapshots/2024/ # Shows: 01, 02, 03, ... (months)
ls /mnt/djafs/snapshots/2024/01/ # Shows: 01, 02, 03, ... (days)
# View filesystem as it was on Jan 1st, 2024
cd /mnt/djafs/snapshots/2024/01/01/
ls 2024/01/01/ # Only files that existed at that time
# Compare different points in time
diff /mnt/djafs/snapshots/2024/01/01/2024/01/01/data.json \
/mnt/djafs/snapshots/2024/01/02/2024/01/01/data.json
# Pause garbage collection for consistent backup
killall -USR1 djafs
# Backup the actual storage (much smaller than original)
rsync -av /data/djafs/ backup_location/
# Resume garbage collection
killall -USR2 djafs
-
Utility Functions (
util/
package):- SHA-256 hashing with content-addressable storage
- ZIP compression/decompression
- Lookup table management
- Metadata generation
- File counting and validation
- Content-addressable file copying
- "Dead end" detection algorithm
- Lookup table collapse functionality
-
Core Data Structures:
LookupEntry
andLookupTable
typesMetadata
structure with JSON serialization- DJFZ archive handling
- Hot cache management
- Archive caching with LRU
-
FUSE Filesystem Interface (
djafs/fs.go
):- Complete FUSE mounting infrastructure
- Full directory and file operations
- Read and write capabilities
- Snapshot system implementation
- Background garbage collection
-
Conversion Tools:
- Archive creation tool (
cmd/converter/
) - Archive validation tool (
cmd/validator/
) - Comprehensive error handling and reporting
- Archive creation tool (
-
Complete FUSE Operations:
- Directory listing (
ReadDirAll
) - File lookup (
Lookup
) with "dead end" detection - File reading (
Read
,ReadAll
) from archives - File writing (
Write
,Create
) with hot cache - File metadata (
Attr
,Setattr
) - Directory creation (
Mkdir
)
- Directory listing (
-
Snapshot System:
- Virtual snapshot directory generation
- Time-based file filtering
- Snapshot browsing interface
- Multiple timestamp format support
- Historical file access
-
Hot Cache Management:
- Background garbage collection
- Write-through caching
- Archive generation and compression
- Automatic file processing pipeline
-
Production Features:
- Graceful shutdown handling
- Comprehensive error recovery
- Performance optimization
- Memory management
- Concurrent operation support
- Utility functions and data structures
- SHA-256 hashing and content addressing
- ZIP compression/decompression
- Lookup table management
- Basic FUSE mounting
- Implement FUSE
Lookup
operation - Implement FUSE
Read
andOpen
operations - Implement FUSE
ReadDir
for directory listing - Implement FUSE
Attr
for file metadata - Basic file reading from archives
- Implement hot cache system
- Implement FUSE
Write
andCreate
operations - Background garbage collection process
- Archive generation and compression
- Lookup table updates
- Virtual snapshot directory generation
- Time-based file filtering
- Historical lookup table parsing
- Snapshot browsing interface
- Backup pause/resume signals
- Performance monitoring and metrics
- Error recovery and fault tolerance
- Configuration management
- Comprehensive testing suite
- Read caching and LRU eviction
- Compression ratio optimization
- Memory usage optimization
- Concurrent operation support
- FUSE Tutorial by Joseph Pfeiffer - Comprehensive FUSE development guide
- bazil.org/fuse Documentation - Go FUSE library documentation
- bazil.org/fuse Examples - Example FUSE implementations
- hellofs - Simple FUSE filesystem example
- zipfs - FUSE filesystem serving ZIP archives
- Writing Filesystems in Go with FUSE - Detailed tutorial
- InfluxDB: Write-through caching and garbage collection patterns
- IPFS: Content-addressable storage design
- Git: Object storage and content hashing
- ZFS: Snapshot and deduplication concepts
- FUSE Debug Mode: Enable with
-o debug
for operation tracing - Go Race Detector: Essential for concurrent FUSE operations
- Bazil Project: Distributed filesystem using similar technologies
- Hot Cache: New writes complete immediately to local cache
- Batched Compression: Files are compressed in groups for better ratios
- Background Processing: Garbage collection runs asynchronously
- Decompression Caching: Recently accessed archives stay decompressed in memory
- Lookup Table Optimization: Sorted lookup tables enable binary search
- Content Addressing: Duplicate content is stored only once
- Compression Ratios: JSON typically compresses 5-10x with gzip
- Deduplication: Identical files consume zero additional storage
- Time-Based Grouping: Similar files compress better when archived together
- Memory Usage: Proportional to number of open archives and cache size
- File Count: Lookup tables support millions of entries efficiently
- Archive Size: Individual archives should stay under 1GB for optimal performance
djafs - Efficient, compressed, time-travel enabled storage for JSON archives.