AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Introduction

AssetOpsBench is a unified framework and environment designed to guide the development, orchestration, and evaluation of domain-specific agents for task automation in industrial asset operations and maintenance. The release of the benchmark focuses on scenarios commonly posed by domain experts—such as maintenance engineers, reliability specialists, and facility planners. We devloped 4 individual domain-specific agents and 2 multi-agent orchestration frameworks to create a simulated industrial environment enabling end-to-end benchmarking of multi-agent workflows in asset operations.

Datasets: 140+ Scenarios

AssetOpsBench created a collection of tasks that we call scenarios, which covers domains of IoT data retrieval (IoT), failure mode and sensor relation discovery (FSMR), time series anomaly detection (TSFM) and work order generation (WO). Some of the tasks are focused on solving problems in single domain, e.g. "List all sensors of Chiller 6 in MAIN site". Others are focused on end-to-end multi-step tasks, e.g. "What is the forecast for 'Chiller 9 Condenser Water Flow' in the week of 2020-04-27 based on data from the MAIN site?" All scenarios can be found here.

AI Agents and Multi-agent Frameworks

We developed 4 domain-specific AI agents while each agent has its own agent tools to be invoked.

IoT Agent: get_sites, get_history, get_assets, get_sensors, ...
FMSR Agent: get_sensors, get_failure_modes, get_failure_sensor_mapping.
TSFM Agent: forecasting, timeseries_anomaly_detection, ...
WO Agent: generate_word_order

To orchestrate multiple agents and run end-to-end workflow, we developed two frameworks:

MetaAgent: a reAct based single-agent-as-tool agent
AgentHive: a plan-and-execute sequential workflow

Leaderboards

We run AssetOpsBench with 7 Large Language Models and evaluate the trajectories of each run using LLM judge (Llama-4-Maverick-17B) on 6-dimentional criteria. The following is the result of MetaAgent. Please find more results in the paper.

Run AssetOpsBench in Docker

We provide a comprehensive documentation on how to run AssetOpsBench in a pre-built dockerized environment. Please refer to the guidance.

Contributors and Contact(*)

Dhaval Patel ([email protected])
Shuxin Lin
James Rayfield
Nianjun Zhou

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
benchmark		benchmark
scenarios		scenarios
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Introduction

Datasets: 140+ Scenarios

AI Agents and Multi-agent Frameworks

Leaderboards

Run AssetOpsBench in Docker

Contributors and Contact(*)

About

Uh oh!

Releases

Packages

Contributors 6

Languages

License

IBM/AssetOpsBench

Folders and files

Latest commit

History

Repository files navigation

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Introduction

Datasets: 140+ Scenarios

AI Agents and Multi-agent Frameworks

Leaderboards

Run AssetOpsBench in Docker

Contributors and Contact(*)

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages