Ultra-fast ONNX inference server built with Rust
A high-performance, lightweight HTTP inference server specialized for ONNX models with zero Python dependencies. Built with Burn's ONNX-to-Rust code generation for ResNet-18 image classification.
- π¦ Pure Rust: Maximum performance, minimal memory footprint (4.5MB binary)
- π₯ ONNX Support: Direct ONNX model loading with automatic shape detection
- β‘ Fast Inference: ~25s inference times for ResNet-18
- π‘οΈ Production Ready: Graceful shutdown, comprehensive error handling
- π HTTP API: RESTful endpoints with CORS support
- π¦ Single Binary: Zero external dependencies
- πΌοΈ Image Classification: Optimized for computer vision models
- Rust 1.75+ and Cargo
- curl for downloading models and testing
- ~50MB disk space for ResNet-18 model
git clone https://github.com/Gilfeather/furnace.git
cd furnace# Create models directory
mkdir -p models
# Download ResNet-18 ONNX model from official ONNX model zoo
curl -L "https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet18-v1-7.onnx" -o models/resnet18.onnx
# Verify download
ls -la models/
# Should show: resnet18.onnx (~45MB)# Build with burn-import feature to generate Rust code from ONNX
cargo build --features burn-import --releaseExpected output: Binary created at ./target/release/furnace (~4.5MB)
# Generate ResNet-18 test samples (creates JSON files locally)
cargo run --example resnet18_sample_dataThis creates the following test files:
resnet18_single_sample.json- Single image test dataresnet18_batch_sample.json- Batch of 3 images test dataresnet18_full_test.json- Full-size single image (150,528 values)
# Start server with built-in ResNet-18 model
./target/release/furnace --model-name resnet18 --host 127.0.0.1 --port 3000Expected output:
π§ Logging initialized log_level=INFO is_production=false
π₯ Starting furnace inference server session_id=...
π Server configuration model_name=resnet18 server_host=127.0.0.1 server_port=3000
π¦ Loading model model_name=resnet18
Loading built-in model: resnet18
Successfully loaded built-in model: resnet18 with backend: burn-resnet18
β
Model loaded successfully input_shape=[1, 3, 224, 224] output_shape=[1000]
π Starting HTTP server
π¦ Concurrency limit set to 100 requests
β
Server running on http://127.0.0.1:3000
Open a new terminal and test the endpoints:
# Health check
curl http://localhost:3000/health
# Expected: {"status":"healthy","model_loaded":true,...}
# Model info
curl http://localhost:3000/model/info
# Expected: {"model_info":{"name":"resnet18","input_spec":{"shape":[1,3,224,224]},...}}
# Single image prediction (~25s inference time for ResNet18)
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
--data-binary @resnet18_full_test.json
# Expected: {"output":[-1.596,-0.173,0.842,...],"status":"success","inference_time_ms":25477.0}
# Batch prediction
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
--data-binary @resnet18_batch_sample.json
# Expected: {"output":[[-1.596,...],[-0.173,...],[0.842,...]],"batch_size":3,"status":"success"}Furnace uses Burn's native ONNX import system to generate Rust code from ONNX models at build time. This provides maximum performance and eliminates runtime dependencies.
- Build-time Code Generation: ONNX models are converted to native Rust code during compilation
- Zero Runtime Dependencies: No ONNX runtime required - everything is compiled into the binary
- Native Performance: Generated code is optimized by the Rust compiler
- Type Safety: Full Rust type checking for model inputs and outputs
furnace/
βββ models/ # Place your ONNX files here
β βββ resnet18.onnx # ResNet-18 model (auto-detected)
β βββ your_model.onnx # Your custom ONNX models
βββ build.rs # Generates Rust code from ONNX files
βββ src/
β βββ onnx_models.rs # Generated model integration
β βββ ...
βββ target/debug/build/.../out/models/
βββ resnet18.rs # Generated Rust code for ResNet-18
βββ your_model.rs # Generated code for your models
Furnace automatically detects and integrates ONNX models! Just place them in the models/ directory and rebuild.
# Place your ONNX model in the models/ directory
cp your_model.onnx models/
# Verify file placement
ls -la models/
# Should show: resnet18.onnx, your_model.onnx, etc.# Build with burn-import feature for ONNX processing
cargo build --features burn-importWhat happens during build:
- π Auto-detects all
.onnxfiles inmodels/directory - π¦ Converts each ONNX model to native Rust code
- β Successfully generated models become available
- β Failed models are skipped (with helpful error messages)
Build Output Example:
Generating ONNX models following Burn documentation
Generating model: resnet18
β
Model 'resnet18' generated successfully
Generating model: your_model
β Failed to generate model 'your_model' - incompatible ONNX format
This model will be skipped. Consider simplifying the ONNX file.
For successfully generated models, add them to src/models/mod.rs:
3a. Add Module Declaration:
// Add your model module
#[cfg(all(feature = "burn-import", model_your_model))]
pub mod your_model {
include!(concat!(env!("OUT_DIR"), "/models/your_model.rs"));
}
// Re-export the model
#[cfg(all(feature = "burn-import", model_your_model))]
pub use your_model::Model as YourModel;3b. Add to BuiltInModel enum:
pub enum BuiltInModel {
ResNet18,
#[cfg(model_your_model)]
YourModel, // Add your model here
}3c. Add Model Loading Logic:
impl BuiltInModel {
pub fn from_name(name: &str) -> Result<Self> {
match name.to_lowercase().as_str() {
"resnet18" => Ok(Self::ResNet18),
#[cfg(model_your_model)]
"yourmodel" => Ok(Self::YourModel), // Add your model
// ...
}
}
pub fn create_model(&self) -> Result<Box<dyn BurnModel>> {
match self {
// ... existing models ...
#[cfg(model_your_model)]
Self::YourModel => {
let model = YourModel::<Backend>::default();
Ok(Box::new(SimpleYourModelWrapper {
model: Arc::new(Mutex::new(model)),
name: "yourmodel".to_string(),
input_shape: vec![1, 3, 224, 224], // Adjust for your model
output_shape: vec![1000], // Adjust for your model
}))
}
}
}
}3d. Create Model Wrapper:
// Add wrapper struct for your model
#[cfg(all(feature = "burn-import", model_your_model))]
#[derive(Debug)]
pub struct SimpleYourModelWrapper {
model: Arc<Mutex<YourModel<Backend>>>,
name: String,
input_shape: Vec<usize>,
output_shape: Vec<usize>,
}
// Implement BurnModel trait
#[cfg(all(feature = "burn-import", model_your_model))]
impl BurnModel for SimpleYourModelWrapper {
fn predict(&self, input: Tensor<Backend, 2>) -> Result<Tensor<Backend, 2>> {
// Validate input and reshape as needed for your model
let model = self.model.lock().unwrap();
let output = model.forward(input); // Adjust based on your model's requirements
Ok(output)
}
// Implement other required methods...
fn get_name(&self) -> &str { &self.name }
fn get_input_shape(&self) -> &[usize] { &self.input_shape }
fn get_output_shape(&self) -> &[usize] { &self.output_shape }
fn get_backend_info(&self) -> String { "burn-yourmodel".to_string() }
// ...
}# Rebuild with your new model integration
cargo build --features burn-import
# List available models
cargo run --bin furnace --features burn-import -- --help
# Start server with your model
cargo run --bin furnace --features burn-import -- --model-name yourmodel --port 3000
# Test inference
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"input": [/* your test data */]}'Step-by-step example of adding MobileNet v2:
# 1. Download MobileNet ONNX model
curl -L "https://github.com/onnx/models/raw/main/validated/vision/classification/mobilenet/model/mobilenetv2-12.onnx" -o models/mobilenetv2.onnx
# 2. Build (automatic generation)
cargo build --features burn-import
# Expected output: β
Model 'mobilenetv2' generated successfully
# 3. Add cfg declaration to build.rs (if not already present)
# Add to build.rs: println!("cargo:rustc-check-cfg=cfg(model_mobilenetv2)");Add to src/models/mod.rs:
// Module declaration
#[cfg(all(feature = "burn-import", model_mobilenetv2))]
pub mod mobilenetv2 {
include!(concat!(env!("OUT_DIR"), "/models/mobilenetv2.rs"));
}
// Re-export
#[cfg(all(feature = "burn-import", model_mobilenetv2))]
pub use mobilenetv2::Model as MobileNetV2Model;
// Add to BuiltInModel enum
pub enum BuiltInModel {
ResNet18,
#[cfg(model_mobilenetv2)]
MobileNetV2,
}
// Wrapper struct
#[cfg(all(feature = "burn-import", model_mobilenetv2))]
#[derive(Debug)]
pub struct SimpleMobileNetV2ModelWrapper {
model: Arc<Mutex<MobileNetV2Model<Backend>>>,
name: String,
input_shape: Vec<usize>,
output_shape: Vec<usize>,
}
// Add to from_name, create_model, etc...Test the new model:
# Build and test
cargo build --features burn-import
cargo run --bin furnace --features burn-import -- --model-name mobilenetv2β Recommended Workflow:
- Test with smaller models first (SqueezeNet, MobileNet)
- Verify ONNX compatibility before integration
- Use descriptive model names (lowercase, no spaces)
- Add comprehensive error handling in your wrapper
- Test thoroughly with your specific input data
- Don't forget the
#[cfg(...)]attributes - Match model names exactly (case-sensitive in file paths)
- Ensure input/output shapes match your actual model
- Add cfg check declarations to build.rs for new models
- Test with both single and batch predictions
π Model Integration Checklist:
- ONNX file placed in
models/directory - Build succeeds with "β Model generated successfully"
- Added module declaration with proper cfg attributes
- Added to BuiltInModel enum
- Added to from_name() method
- Added to create_model() method
- Added wrapper struct and BurnModel implementation
- Added cfg check to build.rs
- Tested server startup
- Tested inference API
Check if model was generated:
# Look for generated Rust files
find target -name "*.rs" -path "*/out/models/*"
# Check build output for your model
cargo build --features burn-import 2>&1 | grep -i "your_model"Verify conditional compilation:
# Check which models are enabled
cargo build --features burn-import -v 2>&1 | grep "model_"Test model loading:
# Try to start server (will show available models if yours isn't found)
cargo run --bin furnace --features burn-import -- --model-name nonexistent
# Error message will list available modelsWhen you build with an ONNX model, Burn generates a complete Rust implementation:
// Example: Generated ResNet-18 code structure
#[derive(Module, Debug)]
pub struct Model<B: Backend> {
conv2d1: Conv2d<B>,
batchnormalization1: BatchNorm<B, 2>,
maxpool2d1: MaxPool2d,
// ... all layers defined
}
impl<B: Backend> Model<B> {
pub fn forward(&self, input: Tensor<B, 4>) -> Tensor<B, 2> {
// Complete forward pass implementation
let x = self.conv2d1.forward(input);
let x = self.batchnormalization1.forward(x);
// ... full computation graph
}
}Supported ONNX Features:
- β Opset 16+ (required)
- β Standard CNN operations (Conv2d, BatchNorm, ReLU, etc.)
- β Image classification models
- β Most PyTorch-exported models
Known Limitations:
- β Some complex models may have unsupported operations
- β Dynamic shapes require manual handling
- β Some models may need ONNX version upgrade
- β Models with broadcasting dimension conflicts
- β Extremely large models (>2GB) may cause memory issues
Troubleshooting Failed Model Generation:
1. ONNX Version Issues:
# Check ONNX opset version
import onnx
model = onnx.load('models/your_model.onnx')
print(f'ONNX opset version: {model.opset_import[0].version}')
# Upgrade to supported version (16+)
from onnx import version_converter
upgraded = version_converter.convert_version(model, 16)
onnx.save(upgraded, 'models/your_model_v16.onnx')2. Model Simplification:
# Install ONNX simplifier
pip install onnx-simplifier
# Simplify complex models
python -c "
import onnx
from onnxsim import simplify
model = onnx.load('models/complex_model.onnx')
simplified, check = simplify(model)
onnx.save(simplified, 'models/simple_model.onnx')
"3. Shape Broadcasting Issues:
# If you see "Invalid shape for broadcasting" errors:
# - Try model simplification first
# - Check if model has dynamic shapes
# - Consider using a different model architecture
# - Report issue with model details for potential fix4. Memory Issues:
# For very large models:
export RUST_MIN_STACK=8388608 # Increase stack size
cargo build --features burn-importQuick validation workflow:
# 1. Build with burn-import feature
cargo build --features burn-import
# 2. Check generated code
find target -name "*.rs" -path "*/out/models/*"
# Should show: resnet18.rs, your_model.rs, etc.
# 3. Test available models
cargo run --bin furnace --features burn-import -- --help
# Will show available model names in help text
# 4. Test server startup
cargo run --bin furnace --features burn-import -- --model-name yourmodel --port 3000
# 5. Test model info endpoint
curl http://localhost:3000/model/info
# 6. Test inference
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"input": [/* your test data matching model input shape */]}'Performance testing:
# Generate test data for your model
cargo run --example create_test_data -- --model yourmodel
# Run benchmarks
cargo bench --features burn-importFurnace supports ONNX models with automatic shape detection. Currently optimized for image classification models.
| Model | Input Shape | Output Shape | Size | Status |
|---|---|---|---|---|
| ResNet-18 | [3, 224, 224] |
[1000] |
45MB | β Supported |
| MobileNet v2 | [3, 224, 224] |
[1000] |
14MB | π§ͺ Compatible |
| SqueezeNet | [3, 224, 224] |
[1000] |
5MB | π§ͺ Compatible |
| GPT-NeoX | [1, 512] |
[50257] |
1.7MB | β Incompatible |
| Your Custom Model | [?, ?, ?] |
[?] |
?MB | π Add with guide above |
Furnace includes built-in models that are compiled during build time:
| Model | Status |
|---|---|
| ResNet-18 | β Available (45MB, built-in) |
| MobileNet v2 | π Add to models/ |
| SqueezeNet | π Add to models/ |
To add additional models, place ONNX files in the models/ directory and rebuild with --features burn-import.
To use your own ONNX models:
- Export your model to ONNX format
- Ensure input shape compatibility (currently optimized for image classification)
- Test with Furnace using the same API endpoints
# Example: Export PyTorch model to ONNX
import torch
import torchvision.models as models
model = models.resnet18(pretrained=True)
model.eval()
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "my_model.onnx")| Metric | Value |
|---|---|
| Binary Size | 4.5MB |
| Model Size | 45MB |
| Inference Time | ~25s |
| Memory Usage | <200MB |
| Startup Time | <2s |
| Input Size | 150,528 values |
| Output Size | 1,000 classes |
Prerequisites:
# Generate test data (benchmarks use built-in model)
cargo run --example resnet18_sample_dataRun benchmarks:
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench single_inference
cargo bench batch_inference
cargo bench latency_measurement- Single Inference: ~0.2ms per image (ResNet-18)
- Batch Processing: Optimized for batches of 1-8 images
- Concurrent Requests: Handles multiple simultaneous requests
- Memory Efficiency: Minimal memory allocation per request
- Throughput: Scales with available CPU cores
Based on Criterion benchmarks on Intel MacBook Pro 2020:
| Benchmark | Time | Throughput |
|---|---|---|
| Single Inference | 152Β΅s | ~6,600 req/s |
| Batch Size 2 | 305Β΅s | ~6,600 req/s |
| Batch Size 4 | 664Β΅s | ~6,000 req/s |
| Batch Size 8 | 1.53ms | ~5,200 req/s |
| Concurrent (4 threads) | 372Β΅s | ~10,800 req/s |
| Concurrent (8 threads) | 561Β΅s | ~14,300 req/s |
Latency Percentiles:
- P50: ~149Β΅s
- P95: ~255Β΅s
- P99: ~340Β΅s
Performance Breakdown:
- ONNX Inference: ~14Β΅s (pure model execution)
- Server Overhead: ~157Β΅s (optimized data processing, validation, tensor conversion)
- Total Time: ~171Β΅s (end-to-end processing)
Optimization Details:
- 50% performance improvement while maintaining full security
- SIMD-optimized input validation for NaN/infinity detection
- Non-blocking statistics updates to prevent deadlocks
- Memory-efficient tensor operations
Test Environment:
- Hardware: Intel MacBook Pro 2020
- Compiler: Rust 1.75+ (release mode with full optimizations)
- Model: ResNet-18 ONNX (45MB, 150,528 input values β 1,000 output classes)
Health check endpoint
{
"status": "healthy",
"model_loaded": true,
"uptime_seconds": 3600,
"timestamp": "2024-01-01T12:00:00Z"
}Model metadata and statistics
{
"model_info": {
"name": "resnet18",
"input_spec": {"shape": [3, 224, 224], "dtype": "float32"},
"output_spec": {"shape": [1000], "dtype": "float32"},
"model_type": "burn",
"backend": "onnx"
},
"stats": {
"inference_count": 42,
"total_inference_time_ms": 168.0,
"average_inference_time_ms": 4.0
}
}Run inference on input data
Single Image:
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
--data-binary @resnet18_full_test.jsonBatch Images:
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
--data-binary @resnet18_batch_sample.jsonResponse:
{
"output": [0.1, 0.05, 0.02, ...], // 1000 ImageNet class probabilities
"status": "success",
"inference_time_ms": 0.2,
"timestamp": "2024-01-01T12:00:00Z",
"batch_size": 1
}ResNet-18 expects normalized RGB image data:
- Shape:
[3, 224, 224](150,528 values) - Format: Flattened array of float32 values
- Range: Typically 0.0 to 1.0 (normalized pixel values)
- Order: Channel-first (RGB channels, then height, then width)
- Rust 1.75+
- Cargo
cargo build --releasecargo testImplement the BurnModel trait in src/burn_model.rs to add support for your own model architectures.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β CLI Layer βββββΆβ Model Layer βββββΆβ API Layer β
β β β β β β
β - Argument β β - Model Loading β β - HTTP Routes β
β Parsing β β - Inference β β - Request β
β - Validation β β - Metadata β β Handling β
β - Logging Setup β β - Error Handlingβ β - CORS β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Model Loading Fails:
# Check available built-in models
./target/release/furnace --help
# Verify model was built
find target -name "resnet18.rs" -path "*/out/models/*"Server Won't Start:
# Check if port is already in use
lsof -i :3000
# Try different port
./target/release/furnace --model-name resnet18 --port 3001Build Errors:
# Update Rust toolchain
rustup update
# Clean and rebuild
cargo clean
cargo build --releaseTest Data Generation Fails:
# Ensure you're in the project root
pwd # Should end with /furnace
# Run with verbose output
cargo run --example resnet18_sample_data --verboseONNX Model Integration Issues:
Generated Code Not Found:
# Check if ONNX models are in the right place
ls -la models/
# Verify code generation during build
cargo build --release 2>&1 | grep -i onnx
# Check generated files
find target -name "*.rs" -path "*/out/models/*"Model Loading Fails:
# Check ONNX model compatibility
# Ensure your model uses ONNX opset 16+
python3 -c "
import onnx
model = onnx.load('models/your_model.onnx')
print(f'ONNX version: {model.opset_import[0].version}')
"
# If version < 16, upgrade the model:
python3 -c "
import onnx
from onnx import version_converter
model = onnx.load('models/your_model.onnx')
upgraded = version_converter.convert_version(model, 16)
onnx.save(upgraded, 'models/your_model_v16.onnx')
"Build Fails with ONNX Errors:
# Some models may have compatibility issues
# Check build warnings for specific ONNX operations
cargo build 2>&1 | grep -A5 -B5 "ONNX\|onnx"
# Try building without problematic models
mv models/problematic_model.onnx models/problematic_model.onnx.bak
cargo build --releaseRuntime Errors with Generated Models:
# Check tensor shape mismatches
# Ensure your input data matches the expected format
curl -X POST http://localhost:3000/model/info # Check expected shapes
# Test with correct input format
# For ResNet-18: [batch_size, 3*224*224] = [1, 150528] valuesIf you're seeing slower inference times:
- Ensure you're using the release build (
cargo build --release) - Check system resources (CPU, memory)
- Try reducing batch size for concurrent requests
- Monitor with
cargo benchfor baseline performance
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.