Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, Rebellions, FuriosaAI, HyperAccel, Google TPU, Graphcore IPU and other NPUs.
It allocates and isolates the underlying computing resources for multi-tenant computation sessions on-demand or in batches with customizable job schedulers with its own orchestrator named "Sokovan".
All its functions are exposed as REST and GraphQL APIs.
- Python: 3.13.x (main branch requires CPython 3.13.7)
- Pantsbuild: 2.27.x
- See full version compatibility table
Required:
- Docker 20.10+ (with Compose v2)
- PostgreSQL 16+ (tested with 16.3)
- Redis 7.2+ (tested with 7.2.11)
- etcd 3.5+ (tested with 3.5.14)
- Prometheus 3.x (tested with 3.1.0)
Recommended (for observability):
- Grafana 11.x (tested with 11.4.0)
- Loki 3.x (tested with 3.5.0)
- Tempo 2.x (tested with 2.7.2)
- OpenTelemetry Collector
→ Detailed infrastructure setup: Infrastructure Documentation
- OS: Linux (Debian/RHEL-based) or macOS
- Permissions: sudo access for installation
- Resources: 4+ CPU cores, 8GB+ RAM recommended for development
git clone https://github.com/lablup/backend.ai.git
cd backend.ai
./scripts/install-dev.shThis script will:
- Check required dependencies (Docker, Python, etc.)
- Set up Python virtual environment with Pantsbuild
- Start halfstack infrastructure (PostgreSQL, Redis, etcd, Grafana, etc.)
- Initialize database schemas
- Create default API keypairs and user accounts
Start each component in separate terminals:
Manager (Terminal 1):
./backend.ai mgr start-server --debugAgent (Terminal 2):
./backend.ai ag start-server --debugStorage Proxy (Terminal 3):
./py -m ai.backend.storage.serverWeb Server (Terminal 4):
./py -m ai.backend.web.serverApp Proxy (Terminal 5-6, optional for in-container service access):
./backend.ai app-proxy-coordinator start-server --debug
./backend.ai app-proxy-worker start-server --debugSet up client environment:
source env-local-user-session.shRun a simple Python session:
./backend.ai run python -c "print('Hello Backend.AI!')"Or access Web UI at http://localhost:8090 with credentials from env-local-*.sh files.
Backend.AI provides websocket tunneling into individual computation sessions (containers), so that users can use their browsers and client CLI to access in-container applications directly in a secure way.
- Jupyter: data scientists' favorite tool
- Most container images have intrinsic Jupyter and JupyterLab support.
- Web-based terminal
- All container sessions have intrinsic ttyd support.
- SSH
- All container sessions have intrinsic SSH/SFTP/SCP support with auto-generated per-user SSH keypair. PyCharm and other IDEs can use on-demand sessions using SSH remote interpreters.
- VSCode
- Most container sessions have intrinsic web-based VSCode support.
Backend.AI provides an abstraction layer on top of existing network-based storages (e.g., NFS/SMB), called vfolders (virtual folders). Each vfolder works like a cloud storage that can be mounted into any computation sessions and shared between users and user groups with differentiated privileges.
Please consult our documentation for community-supported materials. Contact the sales team ([email protected]) for professional paid support and deployment options.
For comprehensive system architecture, component interactions, and infrastructure details, see:
Component Architecture Documentation
This document covers:
- System architecture diagrams and component flow
- Port numbers and infrastructure setup
- Component dependencies and communication protocols
- Development and production environment configuration
This repository contains all open-source server-side components and the client SDK for Python as a reference implementation of API clients.
src/ai/backend/: Source codesmanager/: Manager as the cluster control-planemanager/api: Manager API handlersaccount_manager/: Unified user profile and SSO managementagent/: Agent as per-node controlleragent/docker/: Agent's Docker backendagent/k8s/: Agent's Kubernetes backendagent/dummy/: Agent's dummy backendkernel/: Agent's kernel runner counterpartrunner/: Agent's in-kernel prebuilt binarieshelpers/: Agent's in-kernel helper packagecommon/: Shared utilitiesclient/: Client SDKcli/: Unified CLI for all componentsinstall/: SCIE-based TUI installerstorage/: Storage proxy for offloading storage operationsstorage/api: Storage proxy's manager-facing and client-facing APIsappproxy/: App proxy for accessing container apps from outsideappproxy/coordinator: App proxy coordinator who provisions routing circuitsappproxy/worker: App proxy worker who forwards the trafficweb/: Web UI serverstatic/: Backend.AI WebUI release artifacts
logging/: Logging subsystemplugin/: Plugin subsystemtest/: Integration test suitetestutils/: Shared utilities used by unit testsmeta/: Legacy meta packageaccelerator/: Intrinsic accelerator plugins
docs/: Unified documentationtests/manager/,agent/, ...: Per-component unit tests
configs/manager/,agent/, ...: Per-component sample configurations
docker/: Dockerfiles for auxiliary containersfixtures/manager/, ...: Per-component fixtures for development setup and tests
plugins/: A directory to place plugins such as accelerators, monitors, etc.scripts/: Scripts to assist development workflowsinstall-dev.sh: The single-node development setup script from the working copy
stubs/: Type annotation stub packages written by ustools/: A directory to host Pants-related toolingdist/: A directory to put build artifacts (.whl files) and Pants-exported virtualenvschanges/: News fragments for towncrierpants.toml: The Pants configurationpyproject.toml: Tooling configuration (towncrier, pytest, mypy)BUILD: The root build config file**/BUILD: Per-directory build config filesBUILD_ROOT: An indicator to mark the build root directory for PantsCLAUDE.md: The steering guide for agent-assisted developmentrequirements.txt: The unified requirements file*.lock,tools/*.lock: The dependency lock filesdocker-compose.*.yml: Per-version recommended halfstack container configsREADME.md: This fileMIGRATION.md: The migration guide for updating between major releasesVERSION: The unified version declaration
Server-side components are licensed under LGPLv3 to promote non-proprietary open innovation in the open-source community while other shared libraries and client SDKs are distributed under the MIT license.
There is no obligation to open your service/system codes if you just run the server-side components as-is (e.g., just run as daemons or import the components without modification in your codes). Please contact us (contact-at-lablup-com) for commercial consulting and more licensing details/options about individual use-cases.
Backend.AI consists of the following core components:
Manager - Central API gateway and orchestrator
- Routes REST/GraphQL requests and orchestrates cluster operations
- Session scheduling via Sokovan orchestrator
- User authentication and RBAC authorization
- Plugin interfaces:
backendai_scheduler_v10,backendai_agentselector_v10,backendai_hook_v20,backendai_webapp_v20,backendai_monitor_stats_v10,backendai_monitor_error_v10 - Legacy repo: https://github.com/lablup/backend.ai-manager
Agent - Kernel lifecycle management on compute nodes
- Manages Docker containers (kernels) on individual nodes
- Self-registers to cluster via heartbeats
- Plugin interfaces:
backendai_accelerator_v21,backendai_monitor_stats_v10,backendai_monitor_error_v10 - Legacy repo: https://github.com/lablup/backend.ai-agent
Storage Proxy - Virtual folder and storage backend abstraction
- Unified interface for multiple storage backends
- Real-time performance metrics and acceleration APIs
- Legacy repo: https://github.com/lablup/backend.ai-storage-proxy
Webserver - Web UI hosting and session management
- Hosts Backend.AI WebUI (SPA)
- Session management and API request signing
- Legacy repo: https://github.com/lablup/backend.ai-webserver
Synchronizing the static Backend.AI WebUI version:
$ scripts/download-webui-release.sh <target version to download>App Proxy - Service routing and load balancing
- Routes traffic to in-container services (Jupyter, VSCode, etc.)
- Dynamic circuit provisioning and health monitoring
Kernels - Container image recipes
- Dockerfile-based computing environment recipes
- Support for popular ML frameworks and programming languages
Jail - Programmable sandbox (Rust)
- ptrace-based system call filtering
- Resource control and security enforcement
Hook - In-container runtime library
- libc overrides for resource control
- Web-based interactive stdin support
We offer client SDKs in popular programming languages (MIT License):
- Python -
pip install backend.ai-client| GitHub | Includes CLI - Java - Releases
- Javascript -
npm install backend.ai-client| GitHub - PHP - (under preparation)
composer require lablup/backend.ai-client| GitHub
Backend.AI supports plugin-based extensibility via Python package entrypoints:
Accelerator Plugins (backendai_accelerator_v21)
- CUDA - NVIDIA GPU support
- CUDA Mock - Development without actual GPUs
- ROCm - AMD GPU support
- More available in the enterprise edition
Monitoring Plugins
backendai_monitor_stats_v10- Datadog statistics collectorbackendai_monitor_error_v10- Sentry exception collector
Media Library - Multi-media output support (no longer maintained)
IDE Extensions - (Deprecated: Use in-kernel Jupyter Lab, VSCode Server, or SSH instead)
Build Python wheels or SCIE (Self-Contained Installable Executables):
./scripts/build-wheels.sh # Build .whl packages
./scripts/build-scies.sh # Build SCIE packagesPackages are placed in dist/ directory.
Backend.AI uses Git pre-commit hooks to maintain code quality:
# Automatically runs on every commit:
# - Linting (pants lint)
# - Type checking (pants check)
# Bypass hooks if needed (use sparingly)
git commit --no-verifyThe pre-commit hook validates:
- Code style and formatting
- Type annotations
Tests run in CI for comprehensive coverage.
See CLAUDE.md for detailed hook system documentation.
For detailed development setup, build system usage, and contribution guidelines:
- Development Setup - Python versions, Pantsbuild, dependency management
- CONTRIBUTING.md - Contribution guidelines and development workflow
- MIGRATION.md - Migration guide for major version updates
Refer to LICENSE file.