L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

How to Use?

Installation

git clone https://github.com/cmu-l3/l1.git
cd l1
conda create -n l1 python=3.12
conda activate l1
pip install flash-attn --no-build-isolation
pip install git+https://github.com/volcengine/verl.git
pip install -r requirements.txt

Note: In the latest version of this repository, you are free to use latest version of verl, just make sure to update the configs accordingly.

Prepare Dataset

You can use scripts in scripts/data to prepare your own dataset.

Example, generate data for traininng L1-Exact:

python scripts/data/deepscaler_dataset.py

For L1-Max:

python scripts/data/deepscaler_dataset.py --use_both_both

For Evaluation on AIME2025, GPQA, LSAT and MMLU, you can use scripts in scripts/eval:

python scripts/data/generate_aime.py
python scripts/data/generate_gpqa.py
python scripts/data/generate_lsat.py
python scripts/data/generate_mmlu.py

Train Models

You can skip this step if you want to use our pre-trained models.

You can run scripts in scripts/train to train your own models. Make sure to specify the correct data path.

Evaluate Models

Use one of scripts/eval to evaluate your models. Make sure to specify the correct model path.

For example, evaluate L1-Exact on AIME2025:

./scripts/eval/eval_model_token.sh --model path/to/your/model --num-tokens <num_tokens> --datasets aime2025

Replicate Results

To replicate results for L1-Exact and L1-Max from the paper, you can use scripts in scripts/replicate.

Prepare data:

./scripts/replicate/prepare_data.sh

Evaluate models:

./scripts/replicate/eval_inference_exact.sh l3lab/L1-Qwen-1.5B-Exact
./scripts/replicate/eval_inference_max.sh l3lab/L1-Qwen-1.5B-Max

Acknowledgments

We would like to thank DeepSeek for releasing Deepseek-r1 and distilled models,
Qwen for releasing super-awesome Qwen-2.5 math Models, and
Agentica for codebase, and opensourcing their models and datasets! This codebase is built on top of their work.

Citation

If you use L1/LCPO in your research, please cite:

@misc{aggarwal2025l1controllinglongreasoning,
  title={L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning}, 
  author={Pranjal Aggarwal and Sean Welleck},
  year={2025},
  eprint={2503.04697},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2503.04697}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
deepscaler/data		deepscaler/data
docs		docs
model_outputs		model_outputs
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main_generation.py		main_generation.py
main_ppo.py		main_ppo.py
math_reward.py		math_reward.py
requirements.txt		requirements.txt
rewards_types.py		rewards_types.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

How to Use?

Installation

Prepare Dataset

Train Models

Evaluate Models

Replicate Results

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

cmu-l3/l1

Folders and files

Latest commit

History

Repository files navigation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

How to Use?

Installation

Prepare Dataset

Train Models

Evaluate Models

Replicate Results

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages