Prometheus Metrics

✅ Available:	>= 1.8

The service exposes a Prometheus metrics endpoint at 127.0.0.1 (localhost) port 9090 (default) that provides detailed information about the runners, servers, jobs, and service health.

❗Warning:	This feature is still experimental and the metrics are subject to change in future releases. Please do not rely on specific metric names or labels in production environments without proper versioning.

Configuration

The metrics endpoint can be configured using the following options:

--metrics-port: Port for the Prometheus metrics server (default: 9090)
--metrics-host: Host address to bind the Prometheus metrics server to (default: 127.0.0.1)

Example

To scrape these metrics with Prometheus, add the following to your Prometheus configuration:

scrape_configs:
  - job_name: 'github-hetzner-runners'
    static_configs:
      - targets: ['localhost:9090']
    scrape_interval: 15s

Server Metrics

github_hetzner_runners_servers_total: Total number of servers by status (running, off, initializing, ready, busy)
github_hetzner_runners_servers_total_count: Total number of servers across all statuses
github_hetzner_runners_servers_created_total: Total number of servers created by server type and location
github_hetzner_runners_servers_deleted_total: Total number of servers deleted by server type and location
github_hetzner_runners_server_creation_seconds: Time taken to create a server by server type and location (buckets: 10s, 30s, 1m, 2m, 5m, 10m, 20m, 30m)
github_hetzner_runners_server: Information about each server (server_id, server_name)
github_hetzner_runners_server_labels: Labels assigned to servers (server_id, server_name, label)
github_hetzner_runners_server_status: Server status (1 for active, 0 for inactive) (server_id, server_name, status)

Standby Server Metrics

github_hetzner_runners_standby_servers_total: Total number of standby servers by status, server type, and location
github_hetzner_runners_standby_servers_labels: Labels assigned to standby servers (server_type, location, label)

Recycled Server Metrics

github_hetzner_runners_recycled_servers_total: Total number of recycled servers available by status, server type, and location
github_hetzner_runners_recycled_servers_labels: Labels assigned to recycled servers (server_type, location, label)

Runner Metrics

github_hetzner_runners_runners_total: Total number of runners by status (online, offline)
github_hetzner_runners_runners_total_count: Total number of runners across all statuses
github_hetzner_runners_runners_busy: Number of busy runners
github_hetzner_runners_runner: Information about each runner (runner_id, runner_name)
github_hetzner_runners_runner_labels: Labels assigned to runners (runner_id, runner_name, label)
github_hetzner_runners_runner_status: Runner status tracking both online/offline state and busy/ready state (runner_id, runner_name, status, busy)

Job Metrics

github_hetzner_runners_queued_jobs: Number of queued jobs
github_hetzner_runners_running_jobs: Number of running jobs
github_hetzner_runners_queued_job: Information about queued jobs (job_id, run_id)
github_hetzner_runners_running_job: Information about running jobs (job_id, run_id)
github_hetzner_runners_queued_job_labels: Labels requested by queued jobs (job_id, run_id, label)
github_hetzner_runners_running_job_labels: Labels assigned to running jobs (job_id, run_id, label)
github_hetzner_runners_queued_job_wait_time_seconds: Time job has been waiting in queue (job_id, run_id)
github_hetzner_runners_running_job_time_seconds: Time job has been running (job_id, run_id)

Server Health Metrics

github_hetzner_runners_zombie_servers_total: Total number of zombie servers (servers without registered runners) by server type and location
github_hetzner_runners_zombie_servers_total_count: Total number of zombie servers across all types and locations
github_hetzner_runners_zombie_server_age_seconds: Time since server became a zombie (server_id, server_name)
github_hetzner_runners_unused_runners_total: Total number of unused runners by server type and location
github_hetzner_runners_unused_runners_total_count: Total number of unused runners across all types and locations
github_hetzner_runners_unused_runner_age_seconds: Time since runner was last used (runner_id, runner_name)

Runner Pool Metrics

github_hetzner_runners_pool_status: Runner pool status (1 for active) by pool type, server type, and location
github_hetzner_runners_pool_capacity: Runner pool target capacity by pool type, server type, and location
github_hetzner_runners_pool_available: Number of available runners in pool by pool type, server type, and location

Scale Down Metrics

github_hetzner_runners_scale_down_operations_total: Total number of scale down operations by reason, server type, and location (reason: powered_off, unused, zombie)
github_hetzner_runners_scale_down_operations_total_count: Total number of scale down operations across all reasons

GitHub API Metrics

github_hetzner_runners_github_api_remaining: Number of GitHub API calls remaining
github_hetzner_runners_github_api_limit: Total GitHub API rate limit
github_hetzner_runners_github_api_reset_time: Time until GitHub API rate limit resets in seconds

Cost Metrics

github_hetzner_runners_cost_estimate: Estimated cost in EUR by server type and location

Service Health Metrics

github_hetzner_runners_heartbeat_timestamp: Unix timestamp of the last service heartbeat
github_hetzner_runners_scale_up_failures_last_hour: Total number of scale up failures in the last hour
github_hetzner_runners_scale_up_failure_last_hour: Details about scale up failures in the last hour (error_type, server_name, server_type, location, timestamp_iso, labels, error)

Developed and maintained by the TestFlows team.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly