ColonyOS is an open-source framework for seamless execution of computational workloads across heterogeneous platforms - cloud, edge, HPC, IoT devices, and beyond. It creates Compute Continuums by providing a unified orchestration layer that operates as a meta-orchestrator on top of existing infrastructure.
Traditional orchestration systems are tied to specific platforms (Kubernetes for cloud, Slurm for HPC, etc.). ColonyOS breaks these silos through meta-process management - a broker-based architecture that separates computational intent from execution.
Example of use cases:
- Scientific Computing: Process satellite imagery, analyze sensor data, run simulations across HPC clusters
- AI/ML Pipelines: Distribute training jobs, run inference on edge devices, orchestrate multi-agent LLM systems
- Serverless at Scale: Build FaaS platforms that span cloud, edge, and on-premise infrastructure
- Data Processing: ETL pipelines, batch processing, real-time stream processing with ColonyFS integration
- Industrial IoT: Coordinate computations across factory floor devices, edge gateways, and cloud
- Earth Observation: Automated satellite image processing and analysis workflows
- Infrastructure as Code: Declaratively manage infrastructure across computing continuums - define services spanning cloud, edge, HPC, and IoT with GitOps workflows, automatic drift detection, and self-healing reconciliation
Declarative Intent + Broker + Distributed Execution = Computing Continuums
Instead of writing platform-specific code, you declare WHAT you want to compute using Function Specifications. The Colonies Server acts as a broker that matches your intent with available Executors (distributed workers) that know HOW to execute on their specific platforms. This separation creates seamless Computing Continuums across heterogeneous infrastructure.
- Platform Agnostic: Same function specification runs on Kubernetes, HPC, edge devices, IoT - executors translate to platform-specific execution
- Decoupled Architecture: Submit work anytime, execute asynchronously - temporal and spatial decoupling via broker
- Zero-Trust by Design: No session tokens, no passwords - every request cryptographically signed with Ed25519
- Protocol Flexibility: Choose HTTP/REST, gRPC, CoAP (IoT), or LibP2P (P2P) - or run them all simultaneously
- Pull-Based Execution: Executors connect from anywhere (even behind NAT/firewalls) and pull work - no need for inbound access
- Built-in Audit Trail: Every execution recorded as an immutable ledger for compliance and debugging
- Real-Time Reactive: WebSocket subscriptions for instant notifications on workflow state changes
- Multi-Protocol Architecture: Native support for HTTP/REST, gRPC, CoAP (IoT), and LibP2P (peer-to-peer)
- Distributed Execution: Executors run anywhere on the Internet - supercomputers, edge devices, browsers, embedded systems
- Zero-Trust Security: All communication cryptographically signed with Ed25519
- Workflow DAGs: Complex computational pipelines with parent-child dependencies
- Event-Driven: Real-time WebSocket subscriptions for process state changes
- Scheduled Execution: Cron-based and interval-based job scheduling
- Dynamic Batching: Generators that pack arguments and trigger workflows based on counter or timeout conditions
- Service Reconciliation: Kubernetes-style declarative service management with automatic drift detection and correction
- Full Audit Trail: Complete execution history stored as an immutable ledger
- High Availability: Etcd-based clustering with automatic failover
- Multi-Language SDKs: Go, Rust, Python, Julia, JavaScript, Haskell
- Colony: A distributed runtime environment - a network of loosely connected Executors
- Executor: Distributed worker that pulls and executes workloads (can be implemented in any language, runs anywhere)
- Process: Computational workload with states: WAITING → RUNNING → SUCCESS/FAILED
- FunctionSpec: Specification defining what computation to run and execution conditions
- ProcessGraph: Workflow represented as a Directed Acyclic Graph (DAG)
- Service: Declarative infrastructure specification with desired state management
- Reconciliation: Automatic drift detection and correction that maintains services in their desired state
- Submit: Users submit function specifications to the Colonies server
- Schedule: The scheduler assigns processes to available Executors based on conditions
- Execute: Executors pull assigned processes, execute them, and report results
- Chain: Complex workflows span multiple platforms by chaining processes together
- Monitor: Real-time subscriptions and full execution history enable observability
Colonies implements a zero-trust architecture where all communication is cryptographically signed:
- No traditional authentication tokens or session management
- Each request signed with Ed25519 private keys
- Server validates signatures and enforces role-based access control
- Executors can operate on untrusted infrastructure while maintaining security
Run Colonies server with any combination of protocols:
| Backend | Use Case | Port |
|---|---|---|
| HTTP/REST | Web APIs, dashboards, traditional clients | 8080 |
| gRPC | High-performance, low-latency communication | 50051 |
| CoAP | IoT devices, constrained environments | 5683 |
| LibP2P | Peer-to-peer, decentralized, NAT traversal | 4001 |
Configure via environment variable:
export COLONIES_SERVER_BACKENDS="http,grpc,libp2p" # Run multiple protocols simultaneouslyComprehensive step-by-step tutorials are available in the tutorials repository:
The Colonies Dashboard provides a web UI for monitoring and managing your compute continuum:
- Installation Guide - Install and configure Colonies
- Getting Started - Your first Colonies application
- Configuration - Environment variables and settings
- Backend Configuration - HTTP, gRPC, CoAP, LibP2P setup
- Introduction - Core concepts and architecture
- Implementing Executors - Create executors in Python, Go, Julia, JavaScript
- Fibonacci Tutorial (Go) - Complete example application
- Workflow DAGs - Create complex computational pipelines
- Generators - Batch processing and dynamic workflows
- Cron Jobs - Schedule recurring tasks
- CLI Usage - Command-line interface reference
- Logging - Process logging and monitoring
- Overall Design - System architecture and design patterns
- RPC Protocol - HTTP RPC protocol specification
- Security Design - Zero-trust security model
- Container Building - Build Docker containers for single and multi-platform
- High-Availability Deployment - Production cluster setup
- Monitoring - Grafana and Prometheus integration
- Kubernetes Helm Charts - Deploy on Kubernetes
- Go SDK - Official Go client library
- Python SDK - Python client library
- Rust SDK - Rust client library
- Julia SDK - Julia client library
- JavaScript SDK - JavaScript/Node.js library
- Haskell SDK - Haskell client library
- Executors - Pre-built executor implementations
The repository contains a development container configuration to simplify development environment setup. You can use it locally or in a GitHub Codespace. The configuration will launch a TimescaleDB-insance for the ColonyOS database, a MinIO instance for the ColonyOS file system and the actual development container. It will automatically generate required credentials and keys unique to your environment, no furhter configuration needed.
Local Development (VS Code):
- Install Docker on your machine.
- Install the Dev Containers extension in Visual Studio Code.
- Clone this repository.
- Open the folder in Visual Studio Code.
- When prompted, select "Reopen in Container" or use the command "Dev Containers: Open Folder In Container..." from the command palette.
GitHub Codespaces:
- Simply create a Codespace from the repository page on GitHub. The development container will be set up automatically.
make build # Build the main colonies binary
make container # Build Docker container for local architecture
make container-multiplatform # Build for amd64 and arm64
make install # Install to /usr/local/binFor detailed instructions on building containers including multi-platform builds, see the Container Building Guide.
make test # Run all tests
make github_test # Run tests for CI (no color output)
# Test specific backends
COLONIES_BACKEND_TYPE=gin make test
COLONIES_BACKEND_TYPE=grpc make test
COLONIES_BACKEND_TYPE=libp2p make testmake coverage # Generate coverage reportsColonyOS is currently used in production by:
- RockSigma AB - Automatic seismic processing engine for underground mines, orchestrating workloads across cloud and edge infrastructure
Contributions are welcome! Please see our contributing guidelines and code of conduct.
- Website: colonyos.io
- GitHub: github.com/colonyos
- Tutorials: github.com/colonyos/tutorials
See LICENSE file for details.




