Lightweight & extensible platform health monitoring.
Platform Health is a simple client/server system for lightweight health monitoring of platform components and systems.
The Platform Health client (phc) sends a gRPC health check request to a Platform Health server which is configured to probe a set of network services. Probes run asynchronously on the server (subject to configurable timeouts), with the accumulated response returned to the client.
Probes use a compile-time provider plugin system that supports extension to monitoring of arbitrary services. Integrated providers include:
system: Hierarchical grouping of related health checks with status aggregationsatellite: A separate satellite instance of the Platform Health servertcp: TCP connectivity checkstls: TLS handshake and certificate verificationhttp: HTTP(S) queries with status code and certificate verificationrest: REST API health checks with CEL-based response validationgrpc: gRPC Health v1 service status checkskubernetes: Kubernetes resource existence and readinesshelm: Helm release existence and deployment statusvault: Vault cluster initialization and seal status
Each provider implements the Instance interface, with the health of each instance obtained asynchronously, and contributing to the overall response.
brew install isometry/tap/platform-health$ ph server -l & sleep 1 && ph client && kill %1
{"status":"HEALTHY", "duration":"0.000004833s"}helm upgrade \
--install platform-health \
-n platform-health --create-namespace \
oci://ghcr.io/isometry/charts/platform-healthkubectl create configmap platform-health --from-file=platform-health.yaml=/dev/stdin <<-EOF
components:
ssh@localhost:
type: tcp
host: localhost
port: 22
gmail:
type: tls
host: smtp.gmail.com
port: 465
google:
type: http
url: https://google.com
EOF
kubectl create deployment platform-health --image ghcr.io/isometry/platform-health:latest --port=8080
kubectl patch deployment platform-health --patch-file=/dev/stdin <<-EOF
spec:
template:
spec:
volumes:
- name: config
configMap:
name: platform-health
containers:
- name: platform-health
args:
- -vv
volumeMounts:
- name: config
mountPath: /config
EOF
kubectl create service loadbalancer platform-health --tcp=8080:8080# Check all components
ph client
# Check specific components
ph client -c google -c github
# Check with hierarchical path (system/component)
ph client -c fluxcd/source-controller
# Connect to remote server
ph client prod:8080 -c googleRun health checks once and exit without starting a server:
ph check
# Check specific components only
ph check -c google -c fluxcd/source-controllerThis is useful for:
- Validating configuration files
- Local health check verification
- CI/CD pipeline integration
- Testing specific components
Create and run health checks without a configuration file:
# TCP connectivity check
ph check tcp --host example.com --port 443
# HTTP health check
ph check http --url https://api.example.com/health
# HTTP check with CEL expression
ph check http --url https://api.example.com/health \
--check 'response.status == 200'
# TLS certificate check
ph check tls --host example.com --port 443Inspect the CEL evaluation context for debugging expressions:
# View context for a configured component
ph context my-app
# View context for nested system components
ph context fluxcd/source-controller
# View context for ad-hoc provider
ph context http --url https://api.example.com/healthThe Platform Health server reads configuration from a YAML file. By default, it searches for platform-health.yaml in standard config paths (/config and .).
You can customize this with:
--config-path: Override config file search paths (can be specified multiple times)--config-name: Change the config file name (without extension)
# Use custom config file
ph server --config-name myconfig
# Add search paths
ph server --config-path /custom/path --config-path ./localAll health check components are defined under the components key:
components:
<component-name>:
type: <provider-type>
<provider-specific-config>Component names can contain any characters valid in YAML keys, but should avoid / which is used for path-filtered queries. The type field specifies which provider to use, and the remaining fields are provider-specific configuration.
The following configuration will monitor that /something/ is listening on tcp/22 of localhost; validate connectivity and TLS handshake to the Gmail SSL mail-submission port; and validate that Google is accessible and returning a 200 status code:
components:
ssh@localhost:
type: tcp
host: localhost
port: 22
gmail:
type: tls
host: smtp.gmail.com
port: 465
google:
type: http
url: https://google.com
api-health:
type: rest
request:
url: https://api.example.com/health
method: GET
checks:
- expr: 'response.status == 200'
message: "Expected HTTP 200"
- expr: 'response.json.status == "healthy"'
message: "Service unhealthy"Use the system provider to group related checks:
components:
fluxcd:
type: system
components:
source-controller:
type: kubernetes
resource:
kind: deployment
namespace: flux-system
name: source-controller
kustomize-controller:
type: kubernetes
resource:
kind: deployment
namespace: flux-system
name: kustomize-controllerThe system is reported "healthy" only if all sub-components are healthy.
Several providers support CEL (Common Expression Language) expressions for custom health check validation:
rest: Full HTTP response with JSON parsingkubernetes: Full resource(s), including metadata, spec, status, etc.helm: Release info, chart metadata, values and manifests
Use ph context to inspect the evaluation context available to your expressions. See pkg/checks/README.md for CEL syntax examples and patterns.