Warning: Important information for customers using Central Workload Health Dashboard
This solution, offered by the Open-Source community, does not receive contributions nor support by Microsoft.
Used by gov agencies on GCC, CWHD uses Grafana to visualize performance and health of Azure resources
- What are Tier 0, 1 & 2 dashboards?
- Specialized Dashboards (ready-to-use)
- What is a Color-coded tile?
- Tech Stack
- Logs Required
- Deployment & Configuration
You aim to to cohesively group up all dependent Azure resources into a Tier 1 dashboard. How do you want to group resources is entirely up to you, below is a general guideline:
-
Group by system
For e.g: You have a system that leverages App Services, VMs, Redis Cache, Azure SQL, Storage, Azure OpenAI service and Azure Function. If any one or more of these services fails your system will be affected. The Tier 1 dashboard should monitor all these Azure resources that together supports the functioning of your system.
For example if you have 2 systems Cloud Crafty and Pocket Geek, you will have two Tier 1 dashboards. -
Group by Subscription/Resource Group
The context could be cloud admin monitoring shared resources in landing zones and shared resources are already grouped by Subscription or Resource Group. In this case, 1 subscription = Tier 1 dashbaord
-
Group scattered resources
You could also group Azure resources from different subscriptions and resource groups into a Tier 1 dashboard.
This dashboard is a summary view of all Tier 1 dashboards.
Similar to Tier 1 dashboards, CWHD cannot offer pre-built dashboards as Tier 0 and 1 are fully customized and adapted to your specific grouping of resources.


- Activity Audit Dashboard
- Applcation Gateway Dashboard
- Firewall Dashboard
- API Management Dashboard
- Key Vault Dashboard
Shows you who made changes to Firewall rules, NSG rules, Key Vaults opearations and oerations of all other Azure services
Shows you Key Vault metrics and operations (a modifed version from Azure Monitor)
With version 0.2-wara-preview (docker pull wxzd/cwhd:v0.2.0-wara-preview_130325),
CWHD runs Azure WARA assessment on startup and subsequently every 6 hourly schedule to bring you the past and latest reliability states of your Azure environment.
Under the hood, on every WARA run, CWHD downloads the latest copy of collector.ps1 and analyzer.ps1 and executes these 2 scripts to produce assessment result. Result is then formatted and publish via CWHD-Backend REST APIs to be consumed by Grafana.
- dashboard requires Grafana 11 due to Business Table
- CWHD backend runs on Windows Container
Able to select by past and latest reports and filter by subscription
Color-coded tiles exist in Tier 0 and 1 dashboards only and each Azure resource is represented by a color-coded tile.
Each color-coded tile displays one of the 3 colors at any one time: Green, Amber and Red which represents the different health status.
-
Green
- health status from Azure Resource Health API is healthy
- for App Service specifically, health status are determined by either one of the following data source
- Application Insights Availability Test
- Network Watcher Connection Monitor
- Azure Resource Health API
- when all resources in Tier 1 color-coded tiles are Green, Tier 0 summarizes system status as Green
-
Amber
- affects only Virtual Machine resources. if VM's CPU, Memory and/or Disk usage percentage hits threshold. amber color will be shown. See Deployment & Configuration
- when any one of VM in Tier 1 color-coded tiles is Amber, Tier 0 dashboard summarizes system status as Amber
-
Red
-
when Resource Health API returns unhealthy result
-
for App Service specifically, if either of the following returns unhealthy status
- Application Insights Availability Test
- Network Watcher Connection Monitor
- Azure Resource Health API
-
when any one resource in Tier 1 color-coded tiles is Red, Tier 0 dashboard summarizes system status as Red. Red is "larger" than Amber.
-
- Python 3.11
- Azure Managed Grafana Standard - Grafana 10.4.11
- Docker image
Grafana Dashboards | Logs required in Workspace |
---|---|
|
|
Tier 2 / Activity Audit dashboard | send Activity Log to Workspace |
Tier 2 / Firewall dashboard |
enable Firewall diagnostics settings
|
Tier 2 / API Management dashboard |
enable APIM diagnostics settings
|
Tier 2 / Application Gateway dashboard |
|
Tier 2 / Key Vault dashboar |
enable Key Vault diagnostics settings
|
-
App Service for Containers
-
Publish = Container
-
Operating System = Linux
-
container image - from Dockerhub image wxzd/cwhd:v1.1.1
-
App Service Plan - Standard S1, Premium v3 P0V3 or higher
-
Environment Variables
- APPLICATIONINSIGHTS_CONNECTION_STRING={conn string}
- HealthStatusThreshold={"metricUsageThreshold": { "vm": { "cpuUsagePercentage": 80, "memoryUsagePercentage": 80, "diskUsagePercentage": 80 } } }
- QueryTimeSpanHour=2
- WEBSITES_PORT=8000
- Version=1.1.1
-
enable Managed Identity
- add Azure role assignment (RBAC) for Managed Identity with Monitor Reader to:
- Subscriptions containing resources under monitoring
- Log Analytics Workspace (if workspace in different subscription from above)
- add Azure role assignment (RBAC) for Managed Identity with Monitor Reader to:
-
Enable Application Insights
-
Setup Easy Auth with Microsoft Provider
-
Option 1: Create and use new App registration
-
Option 2: Use an existing registration created separately. Entra ID App configuration example below.
-
-
Networking / Access Restrictions / Site access and rules (After Managed Grafana is deployed and configured)
- Public network access = "Enabled from selected virtual networks and IP addresses"
- Unmatched rule action = Deny
- add 2 Grafana Static IP addresses found under "Deterministic outbound IP"
-
-
Azure Managed Grafana
- Sku = Standard
- enable Managed Identity
- add Azure role assignment (RBAC) for Grafana Managed Identity with Monitor Reader to:
- Subscriptions containing resources under monitoring
- Log Analytics Workspaces (if workspaces are in different subscription from above)
- add Azure role assignment (RBAC) for Grafana Managed Identity with Monitor Reader to:
- Infinity plugin
- Plugin Management, add plugins
- Infinity
- Business Variable Select and hit "Save"
- Configure Infinity data source authn with Entra ID
- Auth type = OAuth2
- Grant type = Client Credentials
- Client Id (App Service Easy Auth service principal)
- Client secret (App Service Easy Auth service principal)
- Token Url: https://login.microsoftonline.com/{tenant id}/oauth2/token
- Endpoint param: Resource : api://{client id} e.g: api://73667734-67cf-49e9-96e1-927ca23d6c18
- Allowed hosts: {Domain of App Service} e.g: https://web-container-cwhd-e3cxcfdyg6bdfza7.southeastasia-01.azurewebsites.net
- Test if Infinity data source is able to authenticate with CWHD web app
- Configuration / Deterministic outbound IP - Enable
- Plugin Management, add plugins
Path | Method | Input Param | Description |
---|---|---|---|
/ | GET | Root path returns "alive" | |
/RHRetriever | POST |
{ "resources": [ { [ "resourceId":"{resource id}", [ "standardTestName": "{ App Insights standard test name }", [ "workspaceId": "{Log Analytics Workspace Id}" [ "network_watcher_conn_mon_test_group_name": "{network watcher connection monitor test group name}" } ] } |
standardTestName and network_watcher_conn_mon_test_group_name are optional params for getting App Service health and will fall back to Resource Health API if not supplied |
CWHD BAckend is a web app that curates telemetry from different data sources including:
- Azure Monitor REST API
- App Service health status determine by any one of the following result:
- Kusto query - Application Insights Availability Test result (AppAvailabilityResults table)
- Kusto query - Network Watcher Connection Monitor (NWConnectionMonitorTestResult table)
- Resource Health API as last option to determine health status if above options are not available
- VM: health status is determine by 2 factors
- Resource Health availability status determines if VM is available or not depicting the Green or Red status.
- If resource health status is Available/Green, Log Analytics Workspace ID is provided, additional 3 metrics of CPU, Memory and Disk usage percentage will be monitored according to a set of configurable thresholds. In Grafana, VM Stat visualization will show Amber status if one or more of the 3 metrics reaches the threshold.
- App Service health status determine by any one of the following result:
- Azure Resource Health API - get resource health for all resource types except App Service, which gets health status from App Insight Standard Test