The Weights & Biases Efficiency Audit Tool is a Python-based utility designed to fetch and analyze historical data from your experiment tracking platform. This tool helps you gain insights into your machine learning projects by auditing compute usage and resource utilization.
- Fetch historical metrics, parameters, and metadata for all experiments.
- Analyze GPU and CPU utilization metrics, including full metric history for GPUs.
- Export results to a detailed Excel report, including raw data, summary, and image report.
Gain insights into your machine learning projects by auditing compute usage and resource utilization.
Utilization / Cost
- Are you using the optimal machine sizes for your workloads?
- How often do you have idle GPU or entire machines?
- How much compute is wasted on idle GPU time?
- What's the financial impact of underutilized resources?
Efficiency:
- What's your overall GPU utilization across all experiments?
- How many experiments run with 0% GPU utilization?
- Which runs represent the biggest optimization opportunities?
- How does your efficiency break down across different runs?
Performance Analysis
- Track CPU, GPU memory, and disk utilization
- Analyze network I/O patterns
- Review system metrics across all experiments
- Identify efficiency patterns over time
wandb-efficiency-audit/
│
├── wandb_efficiency_audit.py # Main script
├── generate_report_image.py # Helper functions for visual report generation
├── fonts/ # Font files for report generation
├── README.md # Documentation
└── pyproject.toml # Python project configuration
- Python 3.9 or higher
- An active W&B account (or access to public W&B projects)
- Turn on W&B System Metrics monitoring (usually enabled by default)
-
Clone this repository:
git clone https://github.com/valohai/wandb-efficiency-audit.git cd wandb-efficiency-audit -
Create a virtualenv and install the required dependencies:
python -m venv venv source venv/bin/activate # This depends on your shell pip install -e .
-
(Optional) Log in to W&B if accessing private projects:
wandb loginNote: No login required for public W&B projects.
-
Run the script to generate the audit report:
wandb-efficiency-audit --project "entity/project" -
To analyze only completed runs (excluding failed/crashed runs):
wandb-efficiency-audit --project "entity/project" --completed-only -
The report will be saved as
experiment_metrics_summary.xlsxin the current directory.
The tool generates two main outputs:
-
experiment_metrics_summary.xlsx - A comprehensive Excel workbook containing:
- Summary sheet with visual report, key metrics, and methodology
- Cost analysis and efficiency distribution
- Example runs with biggest optimization opportunities
- Detailed metrics sheet with all raw data
-
Visual report PNG (embedded in Excel) showing:
- Total GPU utilization percentage
- Total GPU idle time
- Percentage of runs with 0% GPU utilization
- Cost of idle compute by GPU type
- Example runs with low utilization
- Excellent (70%+): Optimal GPU utilization
- Good (50-70%): Acceptable utilization with minor optimization potential
- Fair (30-50%): Significant room for improvement
- Poor (10-30%): Major underutilization issues
- Critical (<10%): Severe waste, immediate action recommended
- Total GPU Utilization: Average GPU core utilization over time across all runs
- Total GPU Idle Time: Total runtime multiplied by (100% - Average GPU utilization)
- Runs with 0% GPU Utilization: Share of runs that have a GPU Core that was not utilized at all during the whole run
- Cost of Idle Compute: Estimated cost of unused GPU time based on AWS on-demand pricing
- Example runs: Example runs that have a low utilization and are in the 25% of the longest runs found in the project
The project dependencies are:
wandb— Interact with W&B tracking server.pandas— Data processing and analysis.openpyxl— Generate Excel reports.pillow— Generate visual report images.requests— HTTP requests for data submission.