We release our annotated dataset from Mathlib 4 and our latest translator model for autoformalization. We refer to our paper in ICLR2025 for more detailed information.
| Dataset | Download |
|---|---|
| Herald Statements: | HuggingFace |
| Herald Proofs | HuggingFace |
| Model | Download |
|---|---|
| Herald Translator | HuggingFace |
| miniF2F-test | miniF2F-valid | extract-theorems | college-math-CoT | |
|---|---|---|---|---|
| TheoremLlama | 50.1% | 55.6% | 4.0% | 3.0% |
| InternLM2-Math-Plus-7B | 73.0% | 80.1% | 7.5% | 6.5% |
| Herald | 96.7% | 96.3% | 23.5% | 16.0% |
You can find our own test sets in ./data directory
- Our code is tested on
vllm >= 0.6.6 - To run the inference, you will need a
leantestenvironment withreplincluded for Lean compiler check. Our code is tested onv4.11.0You can obtain our version here.
-
You can configure your preferred models and settings for back-translation and NLI-check in
config.py. Our test environment use InternLM2-Math-Plus 7B for back-translation and Deepseek V2.5 for NLI-check. -
Then use the script to run the model.
# Translate and verify translated statements python -m run_translate_verify example/test.json example/test_result.json # Do back-translation and NLI-check python -m run_backtrans_nli example/test_result.json
Finish configurations in config.py and run script bash run_pipeline.sh <data.json>. You can also place your own dataset under ./data. Check results in ./data/results.