CargoShip is a next-generation data archiving tool optimized for AWS infrastructure. Built on the foundation of Duke University's SuitcaseCTL, CargoShip adds native AWS integration, intelligent cost optimization, and enterprise-grade observability with advanced network optimization algorithms.
Advanced data archiving for any environment (v0.4.2):
- π Intelligent cost optimization - Save up to 90% with proven algorithms
- β‘ Advanced network algorithms - BBR and CUBIC congestion control for maximum throughput
- π§ Smart compression - ZSTD with adaptive chunking and staging
- π€ AI-Powered S3 Optimization - Predictive prefetching with pattern analysis (NEW in v0.4.2)
- π― Intelligent Caching - Multi-policy cache with network-aware optimization (NEW in v0.4.2)
- π Advanced Monitoring - Real-time analytics with predictive alerting (NEW in v0.4.2)
- π° Advanced budget controls - Cost AND volume limits with grant period management
- π‘οΈ Security first - KMS encryption and compliance-ready audit trails
# Install CargoShip
go install github.com/scttfrdmn/cargoship/cmd/cargoship@latest
# Or download pre-built binary
curl -sSL https://get.cargoship.dev/install.sh | sh# 1. Survey your data and estimate costs with advanced algorithms
cargoship survey /data/project-2024
cargoship estimate /data/completed-analysis --storage-class deep-archive
# 2. Archive with AI-powered S3 optimization (v0.4.2)
cargoship ship /data/completed-analysis \
--destination s3://my-bucket/project-2024 \
--storage-class intelligent-tiering \
--enable-bbr-congestion-control \
--enable-predictive-prefetching \
--cache-policy adaptive \
--max-cost-per-month 200 \
--max-volume 500GB
# 3. Deploy with advanced monitoring and ML predictions
docker run -d --name cargoship-agent \
-v /mnt/data:/data:ro \
-v ~/.aws:/root/.aws:ro \
scttfrdmn/cargoship:v0.4.2 \
--watch /data --enable-s3-optimization --enable-predictive-analyticsCargoShip provides enterprise-grade cost optimization with proven algorithms:
$ cargoship estimate ./genomics-analysis --show-breakdown
π Archive Cost Estimate (1.2TB genomics data)
βββββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ
β Storage Class β Monthly Cost β Annual Cost β
βββββββββββββββββββΌβββββββββββββββΌβββββββββββββββ€
β Standard β $276.48 β $3,317.76 β
β Glacier β $61.44 β $737.28 β
β Deep Archive β $12.29 β $147.48 β
βββββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ
π‘ Optimization Recommendations (v0.4.2):
β’ Archive raw data β Deep Archive (90% savings)
β’ Analysis results β Glacier with BBR congestion control (75% savings)
β’ Enable lifecycle policies β Additional 15% savings
β’ Advanced flow control algorithms β 4.6x faster uploads
β’ AI-powered predictive prefetching β 40% reduction in access latency
β’ Intelligent caching β 65% fewer redundant S3 requests
Total annual savings: $3,170/year with 360% performance gain + 40% faster access
π‘ Coming in v0.5.0 (Dec 2025):
β’ Volume-based budget controls (--max-volume 100GB)
β’ Grant period management (1-3 year budgets with rollover)
β’ Real-time burn rate monitoring with optimization suggestionsCargoShip v0.4.2 introduces intelligent S3 optimization with predictive prefetching:
Network Optimization (v0.4.0+):
- BBR Congestion Control: Google's production-tested algorithm for optimal bandwidth utilization
- CUBIC TCP Algorithm: Linux kernel's proven congestion window management
- RTT Estimation: Signal processing with Kalman filtering and statistical methods
- Loss Detection: Multi-method packet loss detection with deterministic recovery
- Bandwidth-Delay Product: Dynamic buffer sizing with network-aware optimization
Intelligent Prefetching (v0.4.2):
- Predictive Prefetcher: AI-powered prefetching based on access patterns
- Pattern Analysis: Detects sequential, temporal, cyclic, and burst access patterns
- ML Predictions: Ensemble learning with online adaptation for request prediction
- Adaptive Caching: LRU/LFU/Priority-based cache with intelligent eviction
- Network-Aware Scheduling: Priority-based job scheduling optimized for network conditions
Deploy CargoShip with enterprise-grade features:
# docker-compose.yml for enterprise deployment
version: '3.8'
services:
cargoship-enterprise:
image: scttfrdmn/cargoship:v0.4.2
volumes:
- /mnt/enterprise-storage:/data:ro
- ./config:/config
environment:
- CARGOSHIP_WATCH_PATHS=/data/completed,/data/analysis-output
- CARGOSHIP_DESTINATION=s3://enterprise-archive
- CARGOSHIP_STORAGE_CLASS=intelligent-tiering
- CARGOSHIP_ENABLE_BBR=true
- CARGOSHIP_ENABLE_CUBIC=true
- CARGOSHIP_ENABLE_PREDICTIVE_PREFETCHING=true
- CARGOSHIP_ADVANCED_MONITORING=true
- CARGOSHIP_MAX_MONTHLY_COST=5000CargoShip automatically detects datasets ready for archival:
# Configure advanced archival rules with S3 optimization (v0.4.2)
cargoship config set rules.auto-archive true
cargoship config set rules.detect-patterns "*.bam,*.fastq.gz,analysis_complete.txt"
cargoship config set rules.min-age-days 7
cargoship config set rules.storage-class intelligent-tiering
cargoship config set flow-control.algorithm bbr
cargoship config set s3-optimization.enable-predictive-prefetching true
cargoship config set s3-optimization.cache-policy lru
cargoship config set monitoring.enable-advanced-metrics trueβββββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Enterprise Data β β CargoShip β β AWS S3 β
β β β v0.4.2 β β β
β β’ Data Lakes βββββΆβ βββββΆβ β’ All Storage β
β β’ Analytics Output β β β’ BBR/CUBIC β β Classes β
β β’ ML Training Data β β β’ RTT Estimation β β β’ Intelligent β
β β’ Archive Systems β β β’ Loss Recovery β β Tiering β
βββββββββββββββββββββββ β β’ BDP Optimizationβ β β’ Cost Optimize β
β β’ Predictive AI β β β’ Pattern Cache β
β β’ Smart Prefetch β β β’ Auto-Optimize β
ββββββββββββββββββββ βββββββββββββββββββ
- Installation Guide - Get CargoShip running in your environment
- User Guide - Complete feature walkthrough
- Quick Start Wizard - Interactive setup guide
- Complete Documentation - Full documentation site
- Advanced Flow Control - v0.4.0 network optimization algorithms
- AWS Integration Guide - Complete AWS setup and configuration
- Cost Management - Intelligent cost optimization features
- Architecture Overview - System design and components
- Deployment Guide - Production deployment strategies
- Launch Agent Setup - Enterprise agent deployment
- Ghost Ship Deployment - Distributed agent setup
CargoShip welcomes contributions from developers and researchers! Built on Duke University's SuitcaseCTL foundation with enterprise enhancements.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
CargoShip is licensed under the MIT License. See LICENSE for details.
Built on SuitcaseCTL: CargoShip extends SuitcaseCTL by Duke University. We gratefully acknowledge their innovative foundation for research data management.
- Documentation: cargoship.app
- Issues: GitHub Issues
- Community: GitHub Discussions
Ship your data with confidence. Ship it with CargoShip. π’
