Skip to content

cmu-l3/l1

Repository files navigation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning




How to Use?

Installation

git clone https://github.com/cmu-l3/l1.git
cd l1
conda create -n l1 python=3.12
conda activate l1
pip install flash-attn --no-build-isolation
pip install git+https://github.com/volcengine/verl.git
pip install -r requirements.txt

Note: In the latest version of this repository, you are free to use latest version of verl, just make sure to update the configs accordingly.

Prepare Dataset

You can use scripts in scripts/data to prepare your own dataset.

Example, generate data for traininng L1-Exact:

python scripts/data/deepscaler_dataset.py 

For L1-Max:

python scripts/data/deepscaler_dataset.py --use_both_both

For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in scripts/eval:

python scripts/data/generate_aime.py
python scripts/data/generate_gpqa.py
python scripts/data/generate_lsat.py
python scripts/data/generate_mmlu.py

Train Models

You can skip this step if you want to use our pre-trained models.

You can run scripts in scripts/train to train your own models. Make sure to specify the correct data path.

Evaluate Models

Use one of scripts/eval to evaluate your models. Make sure to specify the correct model path.

For example, evaluate L1-Exact on AIME2025:

./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens <num_tokens> --datasets aime2025

Replicate Results

To replicate results for L1-Exact and L1-Max from the paper, you can use scripts in scripts/replicate.

  1. Prepare data:
./scripts/replicate/prepare_data.sh
  1. Evaluate models:
./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact
./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max

Acknowledgments

  • We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models,
  • Qwen for releasing super-awesome Qwen-2.5 math Models, and
  • Agentica for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.

Citation

If you use L1/LCPO in your research, please cite:

@misc{aggarwal2025l1controllinglongreasoning,
  title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning}, 
  author={Pranjal Aggarwal and Sean Welleck},
  year={2025},
  eprint={2503.04697},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2503.04697}, 
}

About

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •