[ arxiv ] [ blog ] [ demo ] [ bibtex ]
NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.
Requirements:
- Ubuntu 24.04.2 or Ubuntu 24.10 (must have supported Linux kernel version >6.10)
- Disable secure boot on your machine - this is needed because we'll be working with an experimental (unsigned) kernel module.
- Docker - follow instructions in docs.docker.com for setup.
Once you have prerequisites use the install script:
./install.sh
This will bring up an XRT docker image that will build the XRT and XDNA debian packages which will be installed on your host machine. Then it will setup the NPUEval docker with all the tools required for NPU application compilation.
Launch the JupyterLab environment to open the notebooks and get familiar with using the dataset
./scripts/launch_jupyter.sh
You'll be able to connect from your browser on port 8888, e.g. http://localhost:8888/lab
or give it an IP address if you're using the machine remotely.
Currently there are 2 simple scripts to reproduce AIECoder results for gpt-4.1 and gpt-4o-mini. You can run these as regular scripts from your Jupyterlab or interactive docker session, or use docker_run_script.sh
to run as individual docker sessions.
docker_run_script.sh scripts/run_completions.py
docker_run_script.sh scripts/run_functional_tests.py
run_completions
script will feed all the prompts to the AIECoder agent and generate solutions for each test. Make sure to set your OPENAI_API_KEY
since it will be making requests to gpt-4.1
and gpt-4o-mini
.
run_functional_tests
will evaluate the LLM generated solutions. Since this is just the evaluator it only requires the NPU and no access to an LLM.
Failed to open KMQ device (err=22): Invalid argument
-- if you see this just reboot the machine, the driver can get into an unstable state. Hopefully this won't happen with newer versions of the NPU driver.- Only targeting AIE2 and AIE2P kernels. Phoenix/Hawk for AIE2 and Strix/Krackan for AIE2P.
- Currently only single output kernels are supported, i.e. 1-in-1-out and 2-in-1-out.
@misc{kalade2025npuevaloptimizingnpukernels,
title={NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers},
author={Sarunas Kalade and Graham Schelle},
year={2025},
eprint={2507.14403},
archivePrefix={arXiv},
primaryClass={cs.PL},
url={https://arxiv.org/abs/2507.14403},
}