Skip to content

augustepoiroux/RLMEval

Repository files navigation

RLMEval: Evaluating Research-Level Neural Theorem Proving & Autoformalization

This repository features the code for the evaluation of neural theorem proving and autoformalization methods on the RLM25 benchmark.

Project setup

Important

Set the environment variables GITHUB_ACCESS_TOKEN (see GitHub documentation) and OPENAI_API_KEY.

Development inside Docker (recommended)

In VS Code, run the Dev Containers: Open Folder in Container... command from the Command Palette (F1). The .devcontainer folder contains the necessary configuration and will take care of setting up the environment.

Local installation

Requirements:

Install Python project:

pip install -e .

RLM25

Prepare the RLM25 dataset:

python scripts/extract_benchmark.py  --config configs/benchmark/rlm25.yaml

Run statement autoformalization evaluation:

python scripts/eval_statement_autoformalization.py --benchmark-config configs/benchmark/rlm25.yaml --model-config configs/models/gpt-4o_greedy.yaml

Run proof autoformalization evaluation:

python scripts/eval_proof_autoformalization.py --benchmark-config configs/benchmark/rlm25.yaml --model-config configs/models_proof/gpt-4o_greedy.yaml

Citation

If you use this code or the RLM25 benchmark in your research, please consider citing one of the following papers. If you use the benchmark for evaluating automated theorem proving or proof autoformalization methods, please cite the first paper. If you use it for evaluating statement autoformalization methods, please cite the second paper.

RLMEval: Evaluating Research-Level Neural Theorem Proving, EMNLP 2025 Findings

@misc{poiroux2025rlmevalevaluatingresearchlevelneural,
    title={RLMEval: Evaluating Research-Level Neural Theorem Proving}, 
    author={Auguste Poiroux and Antoine Bosselut and Viktor Kunčak},
    year={2025},
    eprint={2510.25427},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2510.25427}, 
}

Reliable Evaluation and Benchmarks for Statement Autoformalization, EMNLP 2025

@misc{poiroux2025reliableevaluationbenchmarksstatement,
    title={Reliable Evaluation and Benchmarks for Statement Autoformalization}, 
    author={Auguste Poiroux and Gail Weiss and Viktor Kunčak and Antoine Bosselut},
    year={2025},
    eprint={2406.07222},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2406.07222}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages