Skip to content

Prometheus Metrics

vzakaznikov edited this page Mar 24, 2025 · 3 revisions
✅ Available: >= 1.8

The service exposes a Prometheus metrics endpoint at 127.0.0.1 (localhost) port 9090 (default) that provides detailed information about the runners, servers, jobs, and service health.

❗Warning: This feature is still experimental and the metrics are subject to change in future releases. Please do not rely on specific metric names or labels in production environments without proper versioning.

Configuration

The metrics endpoint can be configured using the following options:

  • --metrics-port: Port for the Prometheus metrics server (default: 9090)
  • --metrics-host: Host address to bind the Prometheus metrics server to (default: 127.0.0.1)

Example

To scrape these metrics with Prometheus, add the following to your Prometheus configuration:

scrape_configs:
  - job_name: 'github-hetzner-runners'
    static_configs:
      - targets: ['localhost:9090']
    scrape_interval: 15s

Server Metrics

  • github_hetzner_runners_servers_total: Total number of servers by status (running, off, initializing, ready, busy)
  • github_hetzner_runners_servers_total_count: Total number of servers across all statuses
  • github_hetzner_runners_servers_created_total: Total number of servers created by server type and location
  • github_hetzner_runners_servers_deleted_total: Total number of servers deleted by server type and location
  • github_hetzner_runners_server_creation_seconds: Time taken to create a server by server type and location (buckets: 10s, 30s, 1m, 2m, 5m, 10m, 20m, 30m)
  • github_hetzner_runners_server: Information about each server (server_id, server_name)
  • github_hetzner_runners_server_labels: Labels assigned to servers (server_id, server_name, label)
  • github_hetzner_runners_server_status: Server status (1 for active, 0 for inactive) (server_id, server_name, status)

Standby Server Metrics

  • github_hetzner_runners_standby_servers_total: Total number of standby servers by status, server type, and location
  • github_hetzner_runners_standby_servers_labels: Labels assigned to standby servers (server_type, location, label)

Recycled Server Metrics

  • github_hetzner_runners_recycled_servers_total: Total number of recycled servers available by status, server type, and location
  • github_hetzner_runners_recycled_servers_labels: Labels assigned to recycled servers (server_type, location, label)

Runner Metrics

  • github_hetzner_runners_runners_total: Total number of runners by status (online, offline)
  • github_hetzner_runners_runners_total_count: Total number of runners across all statuses
  • github_hetzner_runners_runners_busy: Number of busy runners
  • github_hetzner_runners_runner: Information about each runner (runner_id, runner_name)
  • github_hetzner_runners_runner_labels: Labels assigned to runners (runner_id, runner_name, label)
  • github_hetzner_runners_runner_status: Runner status tracking both online/offline state and busy/ready state (runner_id, runner_name, status, busy)

Job Metrics

  • github_hetzner_runners_queued_jobs: Number of queued jobs
  • github_hetzner_runners_running_jobs: Number of running jobs
  • github_hetzner_runners_queued_job: Information about queued jobs (job_id, run_id)
  • github_hetzner_runners_running_job: Information about running jobs (job_id, run_id)
  • github_hetzner_runners_queued_job_labels: Labels requested by queued jobs (job_id, run_id, label)
  • github_hetzner_runners_running_job_labels: Labels assigned to running jobs (job_id, run_id, label)
  • github_hetzner_runners_queued_job_wait_time_seconds: Time job has been waiting in queue (job_id, run_id)
  • github_hetzner_runners_running_job_time_seconds: Time job has been running (job_id, run_id)

Server Health Metrics

  • github_hetzner_runners_zombie_servers_total: Total number of zombie servers (servers without registered runners) by server type and location
  • github_hetzner_runners_zombie_servers_total_count: Total number of zombie servers across all types and locations
  • github_hetzner_runners_zombie_server_age_seconds: Time since server became a zombie (server_id, server_name)
  • github_hetzner_runners_unused_runners_total: Total number of unused runners by server type and location
  • github_hetzner_runners_unused_runners_total_count: Total number of unused runners across all types and locations
  • github_hetzner_runners_unused_runner_age_seconds: Time since runner was last used (runner_id, runner_name)

Runner Pool Metrics

  • github_hetzner_runners_pool_status: Runner pool status (1 for active) by pool type, server type, and location
  • github_hetzner_runners_pool_capacity: Runner pool target capacity by pool type, server type, and location
  • github_hetzner_runners_pool_available: Number of available runners in pool by pool type, server type, and location

Scale Down Metrics

  • github_hetzner_runners_scale_down_operations_total: Total number of scale down operations by reason, server type, and location (reason: powered_off, unused, zombie)
  • github_hetzner_runners_scale_down_operations_total_count: Total number of scale down operations across all reasons

GitHub API Metrics

  • github_hetzner_runners_github_api_remaining: Number of GitHub API calls remaining
  • github_hetzner_runners_github_api_limit: Total GitHub API rate limit
  • github_hetzner_runners_github_api_reset_time: Time until GitHub API rate limit resets in seconds

Cost Metrics

  • github_hetzner_runners_cost_estimate: Estimated cost in EUR by server type and location

Service Health Metrics

  • github_hetzner_runners_heartbeat_timestamp: Unix timestamp of the last service heartbeat
  • github_hetzner_runners_scale_up_failures_last_hour: Total number of scale up failures in the last hour
  • github_hetzner_runners_scale_up_failure_last_hour: Details about scale up failures in the last hour (error_type, server_name, server_type, location, timestamp_iso, labels, error)
Clone this wiki locally