GitHub - weixian-zhang/GCC-CWHD

Azure Monitoring with Grafana

^{powered by} Central Workload Health Dashboard (CWHD)

Warning: Important information for customers using Central Workload Health Dashboard
This solution, offered by the Open-Source community, does not receive contributions nor support by Microsoft.

Used by gov agencies on GCC, CWHD uses Grafana to visualize performance and health of Azure resources

What are Tier 0, 1 & 2 dashboards?
Specialized Dashboards (ready-to-use)
- Wara Dashboard
What is a Color-coded tile?
Tech Stack
Logs Required
Deployment & Configuration
- CWHD Backend REST API Spec
- Architecture

What are Tier 0, 1 & 2 dashboards?

Tier 1 Dashboard (tailor-made)

You aim to to cohesively group up all dependent Azure resources into a Tier 1 dashboard. How do you want to group resources is entirely up to you, below is a general guideline:

Group by system
For e.g: You have a system that leverages App Services, VMs, Redis Cache, Azure SQL, Storage, Azure OpenAI service and Azure Function. If any one or more of these services fails your system will be affected. The Tier 1 dashboard should monitor all these Azure resources that together supports the functioning of your system.

For example if you have 2 systems Cloud Crafty and Pocket Geek, you will have two Tier 1 dashboards.

Tier 1 / Cloud Crafty

Tier 1 / Pocket Geek
Group by Subscription/Resource Group
The context could be cloud admin monitoring shared resources in landing zones and shared resources are already grouped by Subscription or Resource Group. In this case, 1 subscription = Tier 1 dashbaord
Group scattered resources
You could also group Azure resources from different subscriptions and resource groups into a Tier 1 dashboard.

Tier 0 Dashboard (tailor-made)

This dashboard is a summary view of all Tier 1 dashboards.
Similar to Tier 1 dashboards, CWHD cannot offer pre-built dashboards as Tier 0 and 1 are fully customized and adapted to your specific grouping of resources.

For this reason, Tier 0 and 1 dashboards is the core delivery work I will do for my customers, in addition to other custom request for e.g: IIS App Pool start/stop

Activity Audit Dashboard

Shows you who made changes to Firewall rules, NSG rules, Key Vaults opearations and oerations of all other Azure services

Applcation Gateway Dashboard

Firewall Dashboard

API Management Dashboard

Key Vault Dashboard

Shows you Key Vault metrics and operations (a modifed version from Azure Monitor)

Storage dashboard (a modifed version from Azure Monitor)

Specialized Dashboards (ready-to-use)

Wara Dashboard

With version 0.2-wara-preview (docker pull wxzd/cwhd:v0.2.0-wara-preview_130325),
CWHD runs Azure WARA assessment on startup and subsequently every 6 hourly schedule to bring you the past and latest reliability states of your Azure environment.

Under the hood, on every WARA run, CWHD downloads the latest copy of collector.ps1 and analyzer.ps1 and executes these 2 scripts to produce assessment result. Result is then formatted and publish via CWHD-Backend REST APIs to be consumed by Grafana.

dashboard requires Grafana 11 due to Business Table
CWHD backend runs on Windows Container

Able to select by past and latest reports and filter by subscription

What is a Color-coded tile?

Color-coded tiles exist in Tier 0 and 1 dashboards only and each Azure resource is represented by a color-coded tile.
Each color-coded tile displays one of the 3 colors at any one time: Green, Amber and Red which represents the different health status.

Green
- health status from Azure Resource Health API is healthy
- for App Service specifically, health status are determined by either one of the following data source
  - Application Insights Availability Test
  - Network Watcher Connection Monitor
  - Azure Resource Health API
- when all resources in Tier 1 color-coded tiles are Green, Tier 0 summarizes system status as Green
Amber
- affects only Virtual Machine resources. if VM's CPU, Memory and/or Disk usage percentage hits threshold. amber color will be shown. See Deployment & Configuration
- when any one of VM in Tier 1 color-coded tiles is Amber, Tier 0 dashboard summarizes system status as Amber
Red
- when Resource Health API returns unhealthy result
- for App Service specifically, if either of the following returns unhealthy status
  - Application Insights Availability Test
  - Network Watcher Connection Monitor
  - Azure Resource Health API
- when any one resource in Tier 1 color-coded tiles is Red, Tier 0 dashboard summarizes system status as Red. Red is "larger" than Amber.

Tech Stack

Python 3.11
Azure Managed Grafana Standard - Grafana 10.4.11
Docker image

Logs Required

Grafana Dashboards	Logs required in Workspace
Tier 0 resource specific dashboard Tier 1 resource specific dashboard	App Service health signal - either one of the following logs Application Insights Availability Test result Network Watcher Connection Monitor Resource Health API if above are not available Virtual Machine CPU, Memory and Disk usage percentage requires Performance Counters collected by Data Collection Rule / Data Sources / Performance Counters - Basic -> / Destination / Log Analytics Workspace
Tier 2 / Activity Audit dashboard	send Activity Log to Workspace
Tier 2 / Firewall dashboard	enable Firewall diagnostics settings Azure Firewall Network Rule Azure Firewall Application Rule Azure Firewall Nat Rule Azure Firewall Threat Intelligence Azure Firewall IDPS Signature Azure Firewall DNS query
Tier 2 / API Management dashboard	enable APIM diagnostics settings Logs related to APIManagement Gateway enable Application Insights linked to Workspace
Tier 2 / Application Gateway dashboard	enable App Gateway diagnostics settings Application Gateway Performance Log Application Gateway Firewall Log enable Application Insights linked to Workspace
Tier 2 / Key Vault dashboar	enable Key Vault diagnostics settings Audit Logs

Deployment & Configuration

App Service for Containers
- Publish = Container
- Operating System = Linux
- container image - from Dockerhub image wxzd/cwhd:v1.1.1
- App Service Plan - Standard S1, Premium v3 P0V3 or higher
- Environment Variables
  - APPLICATIONINSIGHTS_CONNECTION_STRING={conn string}
  - HealthStatusThreshold={"metricUsageThreshold": { "vm": { "cpuUsagePercentage": 80, "memoryUsagePercentage": 80, "diskUsagePercentage": 80 } } }
  - QueryTimeSpanHour=2
  - WEBSITES_PORT=8000
  - Version=1.1.1
- enable Managed Identity
  - add Azure role assignment (RBAC) for Managed Identity with Monitor Reader to:
    - Subscriptions containing resources under monitoring
    - Log Analytics Workspace (if workspace in different subscription from above)
- Enable Application Insights
- Setup Easy Auth with Microsoft Provider
  - Option 1: Create and use new App registration
  - Option 2: Use an existing registration created separately. Entra ID App configuration example below.
- Networking / Access Restrictions / Site access and rules (After Managed Grafana is deployed and configured)
  - Public network access = "Enabled from selected virtual networks and IP addresses"
  - Unmatched rule action = Deny
  - add 2 Grafana Static IP addresses found under "Deterministic outbound IP"
Azure Managed Grafana
- Sku = Standard
- enable Managed Identity
  - add Azure role assignment (RBAC) for Grafana Managed Identity with Monitor Reader to:
    - Subscriptions containing resources under monitoring
    - Log Analytics Workspaces (if workspaces are in different subscription from above)
- Infinity plugin
  - Plugin Management, add plugins
    - Infinity
    - Business Variable Select and hit "Save"
  - Configure Infinity data source authn with Entra ID
    - Auth type = OAuth2
    - Grant type = Client Credentials
    - Client Id (App Service Easy Auth service principal)
    - Client secret (App Service Easy Auth service principal)
    - Token Url: https://login.microsoftonline.com/{tenant id}/oauth2/token
    - Endpoint param: Resource : api://{client id} e.g: api://73667734-67cf-49e9-96e1-927ca23d6c18
    - Allowed hosts: {Domain of App Service} e.g: https://web-container-cwhd-e3cxcfdyg6bdfza7.southeastasia-01.azurewebsites.net
  - Test if Infinity data source is able to authenticate with CWHD web app
  - Configuration / Deterministic outbound IP - Enable

CWHD Backend REST API Spec

Path	Method	Input Param	Description
/	GET		Root path returns "alive"
/RHRetriever	POST	{ "resources": [ { [ "resourceId":"{resource id}", [ "standardTestName": "{ App Insights standard test name }", [ "workspaceId": "{Log Analytics Workspace Id}" [ "network_watcher_conn_mon_test_group_name": "{network watcher connection monitor test group name}" } ] }	standardTestName and network_watcher_conn_mon_test_group_name are optional params for getting App Service health and will fall back to Resource Health API if not supplied

Architecture

CWHD BAckend is a web app that curates telemetry from different data sources including:

Azure Monitor REST API
- App Service health status determine by any one of the following result:
  - Kusto query - Application Insights Availability Test result (AppAvailabilityResults table)
  - Kusto query - Network Watcher Connection Monitor (NWConnectionMonitorTestResult table)
  - Resource Health API as last option to determine health status if above options are not available
- VM: health status is determine by 2 factors
  - Resource Health availability status determines if VM is available or not depicting the Green or Red status.
  - If resource health status is Available/Green, Log Analytics Workspace ID is provided, additional 3 metrics of CPU, Memory and Disk usage percentage will be monitored according to a set of configurable thresholds. In Grafana, VM Stat visualization will show Amber status if one or more of the 3 metrics reaches the threshold.
Azure Resource Health API - get resource health for all resource types except App Service, which gets health status from App Insight Standard Test

Name		Name	Last commit message	Last commit date
Latest commit History 367 Commits
.github/workflows		.github/workflows
.vscode		.vscode
deploy		deploy
doc		doc
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure Monitoring with Grafana

^{powered by} Central Workload Health Dashboard (CWHD)

Used by gov agencies on GCC, CWHD uses Grafana to visualize performance and health of Azure resources

What are Tier 0, 1 & 2 dashboards?

Tier 1 Dashboard (tailor-made)

Tier 0 Dashboard (tailor-made)

Tier 2 Dashboards (ready-to-use)

Activity Audit Dashboard

Applcation Gateway Dashboard

Firewall Dashboard

API Management Dashboard

Key Vault Dashboard

Storage dashboard (a modifed version from Azure Monitor)

Specialized Dashboards (ready-to-use)

Wara Dashboard

What is a Color-coded tile?

Tech Stack

Logs Required

Deployment & Configuration

CWHD Backend REST API Spec

Architecture

About

Releases 3

Packages

Languages

License

weixian-zhang/GCC-CWHD

Folders and files

Latest commit

History

Repository files navigation

Azure Monitoring with Grafana powered by Central Workload Health Dashboard (CWHD)

Used by gov agencies on GCC, CWHD uses Grafana to visualize performance and health of Azure resources

What are Tier 0, 1 & 2 dashboards?

Tier 1 Dashboard (tailor-made)

Tier 0 Dashboard (tailor-made)

Tier 2 Dashboards (ready-to-use)

Activity Audit Dashboard

Applcation Gateway Dashboard

Firewall Dashboard

API Management Dashboard

Key Vault Dashboard

Storage dashboard (a modifed version from Azure Monitor)

Specialized Dashboards (ready-to-use)

Wara Dashboard

What is a Color-coded tile?

Tech Stack

Logs Required

Deployment & Configuration

CWHD Backend REST API Spec

Architecture

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Azure Monitoring with Grafana

^{powered by} Central Workload Health Dashboard (CWHD)

Packages