NPUEval

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

Getting started

Requirements:

Ubuntu 24.04.2 or Ubuntu 24.10 (must have supported Linux kernel version >6.10)
Disable secure boot on your machine - this is needed because we'll be working with an experimental (unsigned) kernel module.
Docker - follow instructions in docs.docker.com for setup.

Once you have prerequisites use the install script:

./install.sh

This will bring up an XRT docker image that will build the XRT and XDNA debian packages which will be installed on your host machine. Then it will setup the NPUEval docker with all the tools required for NPU application compilation.

Starter notebooks

Launch the JupyterLab environment to open the notebooks and get familiar with using the dataset

./scripts/launch_jupyter.sh

You'll be able to connect from your browser on port 8888, e.g. http://localhost:8888/lab or give it an IP address if you're using the machine remotely.

Reproducing results

Currently there are 2 simple scripts to reproduce AIECoder results for gpt-4.1 and gpt-4o-mini. You can run these as regular scripts from your Jupyterlab or interactive docker session, or use docker_run_script.sh to run as individual docker sessions.

docker_run_script.sh scripts/run_completions.py
docker_run_script.sh scripts/run_functional_tests.py

run_completions script will feed all the prompts to the AIECoder agent and generate solutions for each test. Make sure to set your OPENAI_API_KEY since it will be making requests to gpt-4.1 and gpt-4o-mini. run_functional_tests will evaluate the LLM generated solutions. Since this is just the evaluator it only requires the NPU and no access to an LLM.

Known issues limitations

Failed to open KMQ device (err=22): Invalid argument -- if you see this just reboot the machine, the driver can get into an unstable state. Hopefully this won't happen with newer versions of the NPU driver.
Only targeting AIE2 and AIE2P kernels. Phoenix/Hawk for AIE2 and Strix/Krackan for AIE2P.
Currently only single output kernels are supported, i.e. 1-in-1-out and 2-in-1-out.

References

Bibtex

@misc{kalade2025npuevaloptimizingnpukernels,
      title={NPUEval: Optimizing NPU Kernels with LLMs and Open Source Compilers}, 
      author={Sarunas Kalade and Graham Schelle},
      year={2025},
      eprint={2507.14403},
      archivePrefix={arXiv},
      primaryClass={cs.PL},
      url={https://arxiv.org/abs/2507.14403}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
dataset		dataset
demo		demo
docker		docker
docs		docs
notebooks		notebooks
npueval		npueval
rag/kernels		rag/kernels
scripts		scripts
.gitignore		.gitignore
README.md		README.md
docker_run_script.sh		docker_run_script.sh
install.sh		install.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NPUEval

Getting started

Starter notebooks

Reproducing results

Known issues limitations

References

Bibtex

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

AMDResearch/NPUEval

Folders and files

Latest commit

History

Repository files navigation

NPUEval

Getting started

Starter notebooks

Reproducing results

Known issues limitations

References

Bibtex

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Packages