This repository features the code for the evaluation of neural theorem proving and autoformalization methods on the RLM25 benchmark.
Important
Set the environment variables GITHUB_ACCESS_TOKEN (see GitHub documentation) and OPENAI_API_KEY.
In VS Code, run the Dev Containers: Open Folder in Container... command from the Command Palette (F1). The .devcontainer folder contains the necessary configuration and will take care of setting up the environment.
Requirements:
- Python >= 3.10
- git
- Lean 4
Install Python project:
pip install -e .
Prepare the RLM25 dataset:
python scripts/extract_benchmark.py --config configs/benchmark/rlm25.yaml
Run statement autoformalization evaluation:
python scripts/eval_statement_autoformalization.py --benchmark-config configs/benchmark/rlm25.yaml --model-config configs/models/gpt-4o_greedy.yaml
Run proof autoformalization evaluation:
python scripts/eval_proof_autoformalization.py --benchmark-config configs/benchmark/rlm25.yaml --model-config configs/models_proof/gpt-4o_greedy.yaml
If you use this code or the RLM25 benchmark in your research, please consider citing one of the following papers. If you use the benchmark for evaluating automated theorem proving or proof autoformalization methods, please cite the first paper. If you use it for evaluating statement autoformalization methods, please cite the second paper.
RLMEval: Evaluating Research-Level Neural Theorem Proving, EMNLP 2025 Findings
@misc{poiroux2025rlmevalevaluatingresearchlevelneural,
title={RLMEval: Evaluating Research-Level Neural Theorem Proving},
author={Auguste Poiroux and Antoine Bosselut and Viktor Kunčak},
year={2025},
eprint={2510.25427},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.25427},
}Reliable Evaluation and Benchmarks for Statement Autoformalization, EMNLP 2025
@misc{poiroux2025reliableevaluationbenchmarksstatement,
title={Reliable Evaluation and Benchmarks for Statement Autoformalization},
author={Auguste Poiroux and Gail Weiss and Viktor Kunčak and Antoine Bosselut},
year={2025},
eprint={2406.07222},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.07222},
}