mongo2dynamo is a high-performance, command-line tool for migrating data from MongoDB to DynamoDB, built with Go.
mongo2dynamo is designed for efficient and reliable data migration, incorporating several key features for performance and stability.
- High-Performance Transformation: Utilizes a dynamic worker pool that scales based on CPU cores (from 2 to 2x
runtime.NumCPU()
) with real-time workload monitoring. Workers auto-scale every 500ms based on pending jobs, maximizing parallel processing efficiency. - Optimized Memory Management: Implements strategic memory allocation - extractor uses
ChunkPool
for efficient slice reuse during document streaming, while transformer uses direct allocation with pre-calculated capacity for optimal performance based on benchmarking. - Robust Loading Mechanism: Implements a reliable data loading strategy for DynamoDB using the
BatchWriteItem
API with a concurrent worker pool. Features Exponential Backoff with Jitter algorithm to automatically handle DynamoDB throttling exceptions, ensuring smooth migration process. - Memory-Efficient Extraction: Employs a streaming approach to extract data from MongoDB in configurable chunks (default: 2000 documents), minimizing memory footprint even with large datasets. Supports MongoDB query filters for selective migration.
- Intelligent Field Processing: Automatically converts MongoDB-specific fields (
_id
→id
) and removes framework metadata (__v
,_class
). Pre-calculates output document capacity to minimize memory allocations during transformation. - Fine-Grained Error Handling: Defines domain-specific custom error types for each stage of the ETL process (Extract, Transform, Load). This enables precise error identification and facilitates targeted recovery logic.
- Comprehensive CLI: Built with
Cobra
, providing a user-friendly command-line interface withplan
(dry-run) andapply
commands, flexible configuration options (flags, env vars, config file), and an--auto-approve
flag for non-interactive execution. - Automatic Table Management: Automatically creates DynamoDB tables if they don't exist, with user confirmation prompts (unless auto-approved). Waits for table activation before proceeding with migration.
brew tap dutymate/tap
brew install mongo2dynamo
Download the latest release from the releases page.
git clone https://github.com/dutymate/mongo2dynamo.git
cd mongo2dynamo
make build
# Preview migration
mongo2dynamo plan --mongo-db mydb --mongo-collection users
# Execute migration
mongo2dynamo apply --mongo-db mydb --mongo-collection users --dynamo-endpoint http://localhost:8000
# With filter and auto-approve
mongo2dynamo apply --mongo-db mydb --mongo-collection users \
--dynamo-endpoint http://localhost:8000 \
--mongo-filter '{"status": "active"}' \
--auto-approve
export MONGO2DYNAMO_MONGO_HOST=localhost
export MONGO2DYNAMO_MONGO_PORT=27017
export MONGO2DYNAMO_MONGO_USER=your_username
export MONGO2DYNAMO_MONGO_PASSWORD=your_password
export MONGO2DYNAMO_MONGO_DB=your_database
export MONGO2DYNAMO_MONGO_COLLECTION=your_collection
export MONGO2DYNAMO_MONGO_FILTER='{"status": "active"}'
export MONGO2DYNAMO_DYNAMO_TABLE=your_table
export MONGO2DYNAMO_DYNAMO_ENDPOINT=http://localhost:8000
export MONGO2DYNAMO_AWS_REGION=us-east-1
export MONGO2DYNAMO_MAX_RETRIES=5
export MONGO2DYNAMO_AUTO_APPROVE=false
Create ~/.mongo2dynamo/config.yaml
:
mongo_host: localhost
mongo_port: 27017
mongo_user: your_username
mongo_password: your_password
mongo_db: your_database
mongo_collection: your_collection
mongo_filter: '{"status": "active"}'
dynamo_table: your_table
dynamo_endpoint: http://localhost:8000
aws_region: us-east-1
max_retries: 5
auto_approve: false
Performs a dry-run to preview the migration by executing the full ETL pipeline without loading to DynamoDB.
Features:
- Connects to MongoDB and validates configuration.
- Extracts documents from MongoDB (with filters if specified).
- Transforms documents to DynamoDB format using dynamic worker pools.
- Counts the total number of documents that would be migrated.
- No data is loaded to DynamoDB (dry-run mode).
Example Output:
Starting migration plan analysis...
Found 1,234 documents to migrate.
Executes the complete ETL pipeline to migrate data from MongoDB to DynamoDB.
Features:
- Full ETL pipeline execution (Extract → Transform → Load).
- Configuration validation and user confirmation prompts.
- Automatic DynamoDB table creation (with confirmation).
- Batch processing with optimized chunk sizes (1000 documents per MongoDB batch, 2000 documents per extraction chunk, 25 documents per DynamoDB batch, concurrent loader workers).
- Dynamic worker pool scaling for optimal performance.
- Retry logic for failed operations (configurable via
--max-retries
).
Example Output:
Creating DynamoDB table 'users'...
Waiting for table 'users' to become active...
Table 'users' is now active and ready for use.
Starting data migration from MongoDB to DynamoDB...
Successfully migrated 1,234 documents.
Displays version information including Git commit and build date.
mongo2dynamo follows a standard Extract, Transform, Load (ETL) architecture with parallel processing capabilities. Each stage is designed to perform its task efficiently and reliably.
- Parallel Processing: The ETL stages run concurrently using Go channels with a buffer size of 10, allowing extraction, transformation, and loading to happen simultaneously for maximum throughput.
- Strategic Memory Optimization: Components use independent memory strategies optimized for their specific workloads - extractor leverages
ChunkPool
for slice reuse, while transformer uses direct allocation for maximum speed.
%%{init: { 'theme': 'neutral' } }%%
flowchart LR
subgraph Input Source
MongoDB[(fa:fa-database MongoDB)]
end
subgraph mongo2dynamo
direction LR
subgraph "Extract"
Extractor("fa:fa-cloud-download Extractor<br/><br/>Streams documents<br/>from the source collection.")
end
subgraph "Transform"
Transformer("fa:fa-cogs Transformer<br/><br/>Uses a dynamic worker pool to process documents in parallel.")
end
subgraph "Load"
Loader("fa:fa-upload Loader<br/><br/>Writes data using BatchWriteItem API.<br/>Handles throttling with<br/>Exponential Backoff + Jitter.")
end
end
subgraph Output Target
DynamoDB[(fa:fa-database DynamoDB)]
end
MongoDB -- Documents --> Extractor
Extractor -- Raw Documents --> Transformer
Transformer -- Transformed Items --> Loader
Loader -- Batched Items --> DynamoDB
style Extractor fill:#e6f3ff,stroke:#333
style Transformer fill:#fff2e6,stroke:#333
style Loader fill:#e6ffed,stroke:#333
- Connects to MongoDB using optimized connection settings with configurable batch sizes (default: 1000 documents per batch).
- Uses a streaming approach with
ChunkPool
memory reuse to handle large datasets efficiently. - Processes documents in configurable chunks (default: 2000 documents) to maintain low memory footprint.
- Applies user-defined filters (
--mongo-filter
) with JSON-to-BSON conversion for selective data migration. - Implements robust error handling for connection, decode, and cursor operations.
- Utilizes a dynamic worker pool starting with CPU core count, scaling up to 2x CPU cores based on workload.
- Real-time scaling: Workers auto-adjust every 500ms when pending jobs exceed 2x current worker count.
- Memory optimization: Pre-calculates field counts to allocate maps with optimal capacity, reducing garbage collection overhead.
- Field processing: Converts
_id
toid
with intelligent type handling (ObjectID → hex, primitive.M → JSON), removes__v
and_class
fields. - Implements panic recovery and comprehensive error reporting for worker failures.
- Uses a concurrent worker pool to maximize DynamoDB throughput with parallel batch processing.
- Groups documents into optimal batches of 25 items per
BatchWriteItem
request (DynamoDB limit). - Advanced retry logic: Implements exponential backoff with jitter (100ms to 30s) for unprocessed items, with configurable max retries (default: 5).
- Automatic table management: Creates tables with hash key schema if they don't exist, waits for table activation.
- Handles context cancellation gracefully across all worker goroutines.
Licensed under the MIT License.