AWS Lambda custom runtime using Ruchy (transpiled to Rust) with measured cold start performance of 9.19ms average, 8.14ms best (v3.209.0 with release-ultra optimizations).
All data captured from AWS CloudWatch logs on deployed functions in us-east-1.
| Runtime | Init Duration | Binary Size | Runtime Loaded | Memory Used | Status |
|---|---|---|---|---|---|
| Ruchy v3.209.0 | 9.19ms (8.14ms best) | 352KB | 352KB | 14MB | ✅ Production |
| Rust (tokio) | 14.90ms | 596KB | 596KB | 12MB | Baseline |
| C++ (AWS SDK) | 28.96ms | 87KB | 87KB | 22MB | - |
| Go | 56.49ms | 4.2MB | 4.2MB | 19MB | - |
| Python 3.12 | 85.73ms | 445B | ~78MB* | 36MB | - |
Measurement methodology: AWS Lambda "Init Duration" metric from CloudWatch logs.
*Python paradox: You deploy only 445 bytes of code, but AWS loads a ~78MB Python interpreter. Custom runtimes (Ruchy, Rust, C++, Go) include everything in one small binary, achieving 10x faster cold starts.
| Runtime | Init | Execution | Total |
|---|---|---|---|
| Ruchy v3.209.0 | 9.19ms | 644.74ms | 653.93ms |
| Rust | 14.97ms | 551.33ms | 566.30ms |
| Go | 46.85ms | 689.22ms | 736.07ms |
| C++ | 99.38ms | 1136.72ms | 1236.10ms |
| Python | 92.74ms | 25,083.46ms | 25,176.20ms |
Measured with bashrs bench v6.31.1, fibonacci(35) benchmark:
| Runtime | Mean Time | vs Python | Binary Size |
|---|---|---|---|
| C (gcc -O3) | 11.86ms | 58.1x faster | 15KB |
| Ruchy compile (nasa) | 18.22ms | 37.8x faster | 321KB |
| Ruchy compile (aggressive) | 18.59ms | 37.1x faster | 319KB |
| Rust (opt-level=3) | 21.89ms | 31.5x faster | 312KB |
| Ruchy compile (balanced) | 23.52ms | 29.3x faster | 1.9MB |
| Go | 37.59ms | 18.3x faster | ~1.5MB |
| Julia (JIT) | 182.72ms | 3.8x faster | ~200MB |
| Python | 688.89ms | baseline | ~78MB |
Note: Local benchmarks measure pure execution (fibonacci only). AWS Lambda cold start includes additional overhead from HTTP client, event loop, JSON deserialization (~520-650ms), which dominates the total time.
Key Finding: Ruchy compile (nasa/aggressive) is 16.8% faster than plain Rust due to two-stage optimization (Ruchy AST → rustc).
Runtime size matters for Lambda: Smaller binaries load faster. Python loads a 78MB interpreter (85.73ms init), Julia has 200MB+ runtime making it impractical for serverless.
Ruchy Source (.ruchy) → ruchy transpile → Rust Code → rustc → bootstrap binary → AWS Lambda
Components:
- Ruchy handlers (~178 lines): Business logic transpiled to Rust
- Runtime infrastructure (~600 lines hand-written Rust): HTTP client, Lambda API, logging
- Composition: ~30% Ruchy, ~70% Rust
# Build Lambda bootstrap (production-optimized, 352KB)
cargo build --profile release-ultra -p ruchy-lambda-bootstrap
# Or use the build script (includes transpilation + packaging)
./scripts/build-lambda-package.sh minimal # Uses release-ultra profile
# Or compile Ruchy directly (for standalone programs)
ruchy compile your-handler.ruchy --optimize aggressive # 312KB binary
# Run tests
cargo test --workspace
# Local benchmark
make bench-local
# Deploy to AWS Lambda
./scripts/build-lambda-package.sh minimal
./scripts/deploy-to-aws.sh my-function-nameMinimal (handler_minimal.ruchy):
pub fun lambda_handler(request_id: &str, body: &str) -> String {
"{\"statusCode\":200,\"body\":\"ok\"}"
}
Fibonacci (handler_fibonacci.ruchy):
pub fun fibonacci(n: i32) -> i32 {
if n <= 1 {
n
} else {
fibonacci(n - 1) + fibonacci(n - 2)
}
}
pub fun lambda_handler(request_id: &str, body: &str) -> String {
let n = 35;
let result = fibonacci(n);
let result_str = result.to_string();
String::from("{\"statusCode\":200,\"body\":\"fibonacci(35)=") + &result_str + "\"}"
}
Ruchy Lambda:
ruchy-lambda-minimal- Source | Generated Rustruchy-lambda-fibonacci- Source | Generated Rust
Baselines (from lambda-perf, MIT licensed):
baseline-cpp/baseline-cpp-fibonacci- Sourcebaseline-rust/baseline-rust-fibonacci- Sourcebaseline-go/baseline-go-fibonacci- Sourcebaseline-python/baseline-python-fibonacci- Source
# Verify deployment
aws lambda list-functions \
--query "Functions[?starts_with(FunctionName, 'ruchy-lambda')]"
# Invoke
aws lambda invoke \
--function-name ruchy-lambda-minimal \
--payload '{}' \
response.json- Tests: 100+ across all crates
- Line Coverage: 91.48% (161/176 lines)
- Mutation Score: 86.67% (65/75 mutants caught)
- AWS Validation: 11/11 tests passing
Run tests:
cargo test --workspace
cargo test --test aws_validation_tests -- --ignored # Requires AWS credentialsThe ruchy compile command supports multiple optimization levels for different use cases:
# Development/debugging - fastest compile, largest binary
ruchy compile file.ruchy --optimize none # 3.8MB, fastest compile
# Production default - balanced size/compile time
ruchy compile file.ruchy --optimize balanced # 1.9MB (51% reduction)
# Lambda/Docker - aggressive optimization
ruchy compile file.ruchy --optimize aggressive # 312KB (91.8% reduction)
# Maximum optimization - absolute smallest
ruchy compile file.ruchy --optimize nasa # 315KB (91.8% reduction)
# CI/CD integration
ruchy compile file.ruchy --optimize nasa --json report.json
ruchy compile file.ruchy --optimize nasa --verbose # Show all flagsBinary size comparison:
| Optimization | Binary Size | Reduction | Compile Time | Use Case |
|---|---|---|---|---|
none |
3.8MB | 0% | Fastest | Development/debugging |
balanced |
1.9MB | 51% | Fast | Production default |
aggressive |
312KB | 91.8% | Moderate | Lambda/Docker ✅ |
nasa |
315KB | 91.8% | Slower | Maximum optimization |
Recommendation: Use --optimize aggressive for Lambda deployments (91.8% size reduction).
Recommended for Lambda deployments (used by ./scripts/build-lambda-package.sh):
[profile.release-ultra]
opt-level = 'z' # Optimize for size (reduces cold start)
lto = "fat" # Fat link-time optimization
codegen-units = 1 # Maximum optimization, single compilation unit
panic = 'abort' # No unwinding overhead
strip = true # Remove debug symbolsBuild Profile Comparison (measured with bashrs bench v6.31.1):
| Profile | Build Time | Binary Size | Cold Start | Use Case |
|---|---|---|---|---|
--release |
3.39s | 409KB | 9.96ms | Development |
--profile release-ultra |
3.42s (+1%) | 352KB | 9.19ms | Production ✅ |
Tradeoff: 1% longer compile time for 14% smaller binaries and 7.7% faster cold starts.
Target: x86_64-unknown-linux-musl (AWS Lambda provided.al2023)
Ruchy provides a comprehensive NASA-grade toolchain for profiling and optimization (v3.209.0+):
NEW in v3.209.0: Preset optimization levels for different use cases.
# NASA-grade optimization presets
ruchy compile file.ruchy --optimize none # Debug (0%, 3.8MB)
ruchy compile file.ruchy --optimize balanced # Production (51% reduction, 1.9MB)
ruchy compile file.ruchy --optimize aggressive # Max perf (91.8% reduction, 312KB)
ruchy compile file.ruchy --optimize nasa # Absolute max (91.8% reduction, 315KB)
# CI/CD integration
ruchy compile file.ruchy --optimize nasa --json metrics.json
ruchy compile file.ruchy --optimize nasa --verbose # Show all flagsPerformance Advantage (measured with bashrs bench v6.31.1, fibonacci(35) benchmark):
| Toolchain | Time (ms) | Binary | Advantage |
|---|---|---|---|
| Ruchy compile (nasa) | 18.22ms | 321KB | 16.8% faster than Rust ✅ |
| Ruchy compile (aggressive) | 18.59ms | 319KB | 15.1% faster than Rust |
| Plain Rust (opt-level=3) | 21.89ms | 312KB | Baseline |
| C (gcc -O3) | 11.86ms | 15KB | 53.7% faster than Ruchy |
Key Finding: Ruchy's two-stage optimization (Ruchy AST → rustc) outperforms single-stage rustc compilation by 16.8%, proving transpilation can beat direct compilation through domain-specific optimizations.
NEW in v3.209.0: Profile transpiled binaries for accurate performance data.
# Profile transpiled binary (fast, accurate)
ruchy runtime --profile --binary fibonacci.ruchy
# Run multiple iterations for benchmarking
ruchy runtime --profile --binary --iterations 100 benchmark.ruchy
# Export profiling data
ruchy runtime --profile --binary --output profile.json fibonacci.ruchy# BigO algorithmic complexity analysis
ruchy runtime --bigo algorithm.ruchy
# Benchmark with statistical analysis
ruchy runtime --bench performance_test.ruchy
# Memory usage and allocation tracking
ruchy runtime --memory heap_test.ruchy
# Compare two implementations
ruchy runtime --compare old.ruchy new.ruchy# Detect optimization opportunities
ruchy optimize hotpath.ruchy
# Hardware-specific analysis
ruchy optimize --hardware intel hotpath.ruchy
ruchy optimize --hardware amd hotpath.ruchy
ruchy optimize --hardware arm hotpath.ruchy
# Specific analyses
ruchy optimize --cache hotpath.ruchy # Cache behavior
ruchy optimize --branches hotpath.ruchy # Branch prediction
ruchy optimize --vectorization hotpath.ruchy # SIMD opportunities
ruchy optimize --abstractions hotpath.ruchy # Zero-cost abstractions
# Export recommendations
ruchy optimize --format json --output report.json hotpath.ruchy# 1. Analyze for optimization opportunities
ruchy optimize myapp.ruchy --cache --vectorization
# 2. Profile interpreter execution
ruchy runtime --profile --bigo myapp.ruchy
# 3. Compile with NASA-grade optimization
ruchy compile myapp.ruchy --optimize nasa --json build_metrics.json -o myapp
# 4. Profile the optimized binary
ruchy runtime --profile --binary --iterations 100 myapp.ruchy
# 5. Compare performance
ruchy runtime --compare myapp_old.ruchy myapp.ruchy| Tool | Purpose | Output Formats | Use Case |
|---|---|---|---|
compile --optimize |
NASA-grade presets | Binary + JSON metrics | Lambda/Docker deployment |
runtime --profile --binary |
Binary profiling | Text + JSON | Accurate performance data |
runtime --bigo |
Complexity analysis | Text | Algorithm validation |
runtime --bench |
Benchmarking | Statistical | Performance regression |
runtime --memory |
Memory tracking | Text | Leak detection |
optimize |
Hardware analysis | Text/JSON/HTML | Performance tuning |
What's NEW in v3.209.0:
- ✅
--optimizeflag: 4 presets (none/balanced/aggressive/nasa) - ✅
--binaryflag: Profile transpiled binaries - ✅
--jsonflag: CI/CD metrics export - ✅
--verboseflag: Show optimization flags - ✅ 12.4x binary size reduction capability
ruchy-lambda/
├── crates/
│ ├── bootstrap/ # Lambda entry point
│ │ ├── src/
│ │ │ ├── main.rs
│ │ │ ├── handler_*.ruchy (Ruchy source)
│ │ │ └── handler_*_generated.rs (Transpiled Rust)
│ │ └── build.rs # Auto-transpilation
│ └── runtime/ # Lambda Runtime API
│ └── src/
│ ├── lib.rs # HTTP client, event loop
│ └── logger.rs # CloudWatch logging
├── baselines/ # Comparison implementations
├── benchmarks/
│ └── local-fibonacci/ # Local benchmarking
└── scripts/
├── build-lambda-package.sh
└── deploy-to-aws.sh
From PMAT analysis:
- TDG Grade: A+ (98.1/100)
- Cyclomatic Complexity: Max 5 (target: ≤15)
- Cognitive Complexity: Max 4 (target: ≤20)
- SATD Violations: 0
- ARCHITECTURE.md - Technical design
- BENCHMARKS.md - Performance analysis
- VERIFICATION_REPORT.md - Ruchy vs Rust composition analysis
- baselines/README.md - Baseline implementation details
- benchmarks/local-fibonacci/README.md - Local benchmark guide
- Docker runtime - Another repo dedicated to showing Docker runtime sizes
Runtime (production):
serde= "1.0"serde_json= "1.0"
Development:
- Requires
ruchycompiler in PATH for transpilation - AWS CLI for deployment
MIT OR Apache-2.0
- Baseline implementations from lambda-perf (MIT License, Maxime David)
- Benchmarking framework adapted from ruchy-book Chapter 21
- Uses bashrs bench v6.25.0 for performance measurement