AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
📄 Paper, 🤗 Huggingface, 📢 Blog
AssetOpsBench is a unified framework and environment designed to guide the development, orchestration, and evaluation of domain-specific agents for task automation in industrial asset operations and maintenance. The release of the benchmark focuses on scenarios commonly posed by domain experts—such as maintenance engineers, reliability specialists, and facility planners. We devloped 4 individual domain-specific agents and 2 multi-agent orchestration frameworks to create a simulated industrial environment enabling end-to-end benchmarking of multi-agent workflows in asset operations.
AssetOpsBench created a collection of tasks that we call scenarios, which covers domains of IoT data retrieval (IoT), failure mode and sensor relation discovery (FSMR), time series anomaly detection (TSFM) and work order generation (WO). Some of the tasks are focused on solving problems in single domain, e.g. "List all sensors of Chiller 6 in MAIN site". Others are focused on end-to-end multi-step tasks, e.g. "What is the forecast for 'Chiller 9 Condenser Water Flow' in the week of 2020-04-27 based on data from the MAIN site?" All scenarios can be found here.
We developed 4 domain-specific AI agents while each agent has its own agent tools to be invoked.
- IoT Agent:
get_sites
,get_history
,get_assets
,get_sensors
, ... - FMSR Agent:
get_sensors
,get_failure_modes
,get_failure_sensor_mapping
. - TSFM Agent:
forecasting
,timeseries_anomaly_detection
, ... - WO Agent:
generate_word_order
To orchestrate multiple agents and run end-to-end workflow, we developed two frameworks:
- MetaAgent: a reAct based single-agent-as-tool agent
- AgentHive: a plan-and-execute sequential workflow
We run AssetOpsBench with 7 Large Language Models and evaluate the trajectories of each run using LLM judge (Llama-4-Maverick-17B) on 6-dimentional criteria. The following is the result of MetaAgent. Please find more results in the paper.
We provide a comprehensive documentation on how to run AssetOpsBench in a pre-built dockerized environment. Please refer to the guidance.
- Dhaval Patel ([email protected])
- Shuxin Lin
- James Rayfield
- Nianjun Zhou