Skip to content

paiml/ruchy-lambda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ruchy Lambda

AWS Lambda custom runtime using Ruchy (transpiled to Rust) with measured cold start performance of 9.19ms average, 8.14ms best (v3.209.0 with release-ultra optimizations).

Performance (Measured on AWS Lambda)

All data captured from AWS CloudWatch logs on deployed functions in us-east-1.

Cold Start Comparison

Runtime Init Duration Binary Size Runtime Loaded Memory Used Status
Ruchy v3.209.0 9.19ms (8.14ms best) 352KB 352KB 14MB Production
Rust (tokio) 14.90ms 596KB 596KB 12MB Baseline
C++ (AWS SDK) 28.96ms 87KB 87KB 22MB -
Go 56.49ms 4.2MB 4.2MB 19MB -
Python 3.12 85.73ms 445B ~78MB* 36MB -

Measurement methodology: AWS Lambda "Init Duration" metric from CloudWatch logs.

*Python paradox: You deploy only 445 bytes of code, but AWS loads a ~78MB Python interpreter. Custom runtimes (Ruchy, Rust, C++, Go) include everything in one small binary, achieving 10x faster cold starts.

Fibonacci(35) Execution (59M recursive calls)

Runtime Init Execution Total
Ruchy v3.209.0 9.19ms 644.74ms 653.93ms
Rust 14.97ms 551.33ms 566.30ms
Go 46.85ms 689.22ms 736.07ms
C++ 99.38ms 1136.72ms 1236.10ms
Python 92.74ms 25,083.46ms 25,176.20ms

Local Benchmark (Pure Execution)

Measured with bashrs bench v6.31.1, fibonacci(35) benchmark:

Runtime Mean Time vs Python Binary Size
C (gcc -O3) 11.86ms 58.1x faster 15KB
Ruchy compile (nasa) 18.22ms 37.8x faster 321KB
Ruchy compile (aggressive) 18.59ms 37.1x faster 319KB
Rust (opt-level=3) 21.89ms 31.5x faster 312KB
Ruchy compile (balanced) 23.52ms 29.3x faster 1.9MB
Go 37.59ms 18.3x faster ~1.5MB
Julia (JIT) 182.72ms 3.8x faster ~200MB
Python 688.89ms baseline ~78MB

Note: Local benchmarks measure pure execution (fibonacci only). AWS Lambda cold start includes additional overhead from HTTP client, event loop, JSON deserialization (~520-650ms), which dominates the total time.

Key Finding: Ruchy compile (nasa/aggressive) is 16.8% faster than plain Rust due to two-stage optimization (Ruchy AST → rustc).

Runtime size matters for Lambda: Smaller binaries load faster. Python loads a 78MB interpreter (85.73ms init), Julia has 200MB+ runtime making it impractical for serverless.

Architecture

Ruchy Source (.ruchy) → ruchy transpile → Rust Code → rustc → bootstrap binary → AWS Lambda

Components:

  • Ruchy handlers (~178 lines): Business logic transpiled to Rust
  • Runtime infrastructure (~600 lines hand-written Rust): HTTP client, Lambda API, logging
  • Composition: ~30% Ruchy, ~70% Rust

Quick Start

# Build Lambda bootstrap (production-optimized, 352KB)
cargo build --profile release-ultra -p ruchy-lambda-bootstrap

# Or use the build script (includes transpilation + packaging)
./scripts/build-lambda-package.sh minimal  # Uses release-ultra profile

# Or compile Ruchy directly (for standalone programs)
ruchy compile your-handler.ruchy --optimize aggressive  # 312KB binary

# Run tests
cargo test --workspace

# Local benchmark
make bench-local

# Deploy to AWS Lambda
./scripts/build-lambda-package.sh minimal
./scripts/deploy-to-aws.sh my-function-name

Handler Example

Minimal (handler_minimal.ruchy):

pub fun lambda_handler(request_id: &str, body: &str) -> String {
    "{\"statusCode\":200,\"body\":\"ok\"}"
}

Fibonacci (handler_fibonacci.ruchy):

pub fun fibonacci(n: i32) -> i32 {
    if n <= 1 {
        n
    } else {
        fibonacci(n - 1) + fibonacci(n - 2)
    }
}

pub fun lambda_handler(request_id: &str, body: &str) -> String {
    let n = 35;
    let result = fibonacci(n);
    let result_str = result.to_string();
    String::from("{\"statusCode\":200,\"body\":\"fibonacci(35)=") + &result_str + "\"}"
}

Deployed Functions (Verifiable)

Ruchy Lambda:

Baselines (from lambda-perf, MIT licensed):

  • baseline-cpp / baseline-cpp-fibonacci - Source
  • baseline-rust / baseline-rust-fibonacci - Source
  • baseline-go / baseline-go-fibonacci - Source
  • baseline-python / baseline-python-fibonacci - Source
# Verify deployment
aws lambda list-functions \
  --query "Functions[?starts_with(FunctionName, 'ruchy-lambda')]"

# Invoke
aws lambda invoke \
  --function-name ruchy-lambda-minimal \
  --payload '{}' \
  response.json

Test Coverage

  • Tests: 100+ across all crates
  • Line Coverage: 91.48% (161/176 lines)
  • Mutation Score: 86.67% (65/75 mutants caught)
  • AWS Validation: 11/11 tests passing

Run tests:

cargo test --workspace
cargo test --test aws_validation_tests -- --ignored  # Requires AWS credentials

Build Configuration

Ruchy Compiler Optimization Levels

The ruchy compile command supports multiple optimization levels for different use cases:

# Development/debugging - fastest compile, largest binary
ruchy compile file.ruchy --optimize none       # 3.8MB, fastest compile

# Production default - balanced size/compile time
ruchy compile file.ruchy --optimize balanced   # 1.9MB (51% reduction)

# Lambda/Docker - aggressive optimization
ruchy compile file.ruchy --optimize aggressive # 312KB (91.8% reduction)

# Maximum optimization - absolute smallest
ruchy compile file.ruchy --optimize nasa       # 315KB (91.8% reduction)

# CI/CD integration
ruchy compile file.ruchy --optimize nasa --json report.json
ruchy compile file.ruchy --optimize nasa --verbose  # Show all flags

Binary size comparison:

Optimization Binary Size Reduction Compile Time Use Case
none 3.8MB 0% Fastest Development/debugging
balanced 1.9MB 51% Fast Production default
aggressive 312KB 91.8% Moderate Lambda/Docker
nasa 315KB 91.8% Slower Maximum optimization

Recommendation: Use --optimize aggressive for Lambda deployments (91.8% size reduction).

Cargo Release Profiles

Production Profile: release-ultra

Recommended for Lambda deployments (used by ./scripts/build-lambda-package.sh):

[profile.release-ultra]
opt-level = 'z'           # Optimize for size (reduces cold start)
lto = "fat"               # Fat link-time optimization
codegen-units = 1         # Maximum optimization, single compilation unit
panic = 'abort'           # No unwinding overhead
strip = true              # Remove debug symbols

Build Profile Comparison (measured with bashrs bench v6.31.1):

Profile Build Time Binary Size Cold Start Use Case
--release 3.39s 409KB 9.96ms Development
--profile release-ultra 3.42s (+1%) 352KB 9.19ms Production

Tradeoff: 1% longer compile time for 14% smaller binaries and 7.7% faster cold starts.

Target: x86_64-unknown-linux-musl (AWS Lambda provided.al2023)

Compiler Profiling & Optimization Tools

Ruchy provides a comprehensive NASA-grade toolchain for profiling and optimization (v3.209.0+):

1. Compilation Optimization (ruchy compile)

NEW in v3.209.0: Preset optimization levels for different use cases.

# NASA-grade optimization presets
ruchy compile file.ruchy --optimize none        # Debug (0%, 3.8MB)
ruchy compile file.ruchy --optimize balanced    # Production (51% reduction, 1.9MB)
ruchy compile file.ruchy --optimize aggressive  # Max perf (91.8% reduction, 312KB)
ruchy compile file.ruchy --optimize nasa        # Absolute max (91.8% reduction, 315KB)

# CI/CD integration
ruchy compile file.ruchy --optimize nasa --json metrics.json
ruchy compile file.ruchy --optimize nasa --verbose  # Show all flags

Performance Advantage (measured with bashrs bench v6.31.1, fibonacci(35) benchmark):

Toolchain Time (ms) Binary Advantage
Ruchy compile (nasa) 18.22ms 321KB 16.8% faster than Rust
Ruchy compile (aggressive) 18.59ms 319KB 15.1% faster than Rust
Plain Rust (opt-level=3) 21.89ms 312KB Baseline
C (gcc -O3) 11.86ms 15KB 53.7% faster than Ruchy

Key Finding: Ruchy's two-stage optimization (Ruchy AST → rustc) outperforms single-stage rustc compilation by 16.8%, proving transpilation can beat direct compilation through domain-specific optimizations.

2. Binary Profiling (ruchy runtime --profile --binary)

NEW in v3.209.0: Profile transpiled binaries for accurate performance data.

# Profile transpiled binary (fast, accurate)
ruchy runtime --profile --binary fibonacci.ruchy

# Run multiple iterations for benchmarking
ruchy runtime --profile --binary --iterations 100 benchmark.ruchy

# Export profiling data
ruchy runtime --profile --binary --output profile.json fibonacci.ruchy

3. Performance Analysis Tools

# BigO algorithmic complexity analysis
ruchy runtime --bigo algorithm.ruchy

# Benchmark with statistical analysis
ruchy runtime --bench performance_test.ruchy

# Memory usage and allocation tracking
ruchy runtime --memory heap_test.ruchy

# Compare two implementations
ruchy runtime --compare old.ruchy new.ruchy

4. Hardware-Aware Optimization Analysis (ruchy optimize)

# Detect optimization opportunities
ruchy optimize hotpath.ruchy

# Hardware-specific analysis
ruchy optimize --hardware intel hotpath.ruchy
ruchy optimize --hardware amd hotpath.ruchy
ruchy optimize --hardware arm hotpath.ruchy

# Specific analyses
ruchy optimize --cache hotpath.ruchy              # Cache behavior
ruchy optimize --branches hotpath.ruchy           # Branch prediction
ruchy optimize --vectorization hotpath.ruchy      # SIMD opportunities
ruchy optimize --abstractions hotpath.ruchy       # Zero-cost abstractions

# Export recommendations
ruchy optimize --format json --output report.json hotpath.ruchy

Complete Optimization Workflow

# 1. Analyze for optimization opportunities
ruchy optimize myapp.ruchy --cache --vectorization

# 2. Profile interpreter execution
ruchy runtime --profile --bigo myapp.ruchy

# 3. Compile with NASA-grade optimization
ruchy compile myapp.ruchy --optimize nasa --json build_metrics.json -o myapp

# 4. Profile the optimized binary
ruchy runtime --profile --binary --iterations 100 myapp.ruchy

# 5. Compare performance
ruchy runtime --compare myapp_old.ruchy myapp.ruchy

Toolchain Summary

Tool Purpose Output Formats Use Case
compile --optimize NASA-grade presets Binary + JSON metrics Lambda/Docker deployment
runtime --profile --binary Binary profiling Text + JSON Accurate performance data
runtime --bigo Complexity analysis Text Algorithm validation
runtime --bench Benchmarking Statistical Performance regression
runtime --memory Memory tracking Text Leak detection
optimize Hardware analysis Text/JSON/HTML Performance tuning

What's NEW in v3.209.0:

  • --optimize flag: 4 presets (none/balanced/aggressive/nasa)
  • --binary flag: Profile transpiled binaries
  • --json flag: CI/CD metrics export
  • --verbose flag: Show optimization flags
  • ✅ 12.4x binary size reduction capability

Project Structure

ruchy-lambda/
├── crates/
│   ├── bootstrap/          # Lambda entry point
│   │   ├── src/
│   │   │   ├── main.rs
│   │   │   ├── handler_*.ruchy          (Ruchy source)
│   │   │   └── handler_*_generated.rs   (Transpiled Rust)
│   │   └── build.rs        # Auto-transpilation
│   └── runtime/            # Lambda Runtime API
│       └── src/
│           ├── lib.rs      # HTTP client, event loop
│           └── logger.rs   # CloudWatch logging
├── baselines/              # Comparison implementations
├── benchmarks/
│   └── local-fibonacci/    # Local benchmarking
└── scripts/
    ├── build-lambda-package.sh
    └── deploy-to-aws.sh

Quality Metrics

From PMAT analysis:

  • TDG Grade: A+ (98.1/100)
  • Cyclomatic Complexity: Max 5 (target: ≤15)
  • Cognitive Complexity: Max 4 (target: ≤20)
  • SATD Violations: 0

Documentation

Dependencies

Runtime (production):

  • serde = "1.0"
  • serde_json = "1.0"

Development:

  • Requires ruchy compiler in PATH for transpilation
  • AWS CLI for deployment

License

MIT OR Apache-2.0

Attribution

  • Baseline implementations from lambda-perf (MIT License, Maxime David)
  • Benchmarking framework adapted from ruchy-book Chapter 21
  • Uses bashrs bench v6.25.0 for performance measurement

About

A proof of concept that attempts to beat all languages on AWS

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •