Haizhong Zheng1, Yang Zhou1, Brian R. Bartoldson2, Bhavya Kailkhura2,
Fan Lai3, Jiawei Zhao4, Beidi Chen1
1Carnegie Mellon University,
2Lawrence Livermore National Laboratory,
3University of Illinois Urbana-Champaign,
4Meta AI
TL;DR We propose GRESO, a lightweight pre-rollout filtering method that improves the efficiency of rollout scaling in LLM RL by predicting and skipping low-value prompts.
- [2025.06.03] Blog post released: Act Only When It Pays – GRESO.
- [2025.06.03] Paper preprint available on arXiv.
Figure 1: We train Qwen2.5-Math-1.5B/7B on the DAPO + MATH dataset and evaluate them on five math reasoning benchmarks: MATH500, AMC, Gaokao, Minerva, and Olympiad Bench. Compared to the baseline method (Dynamic Sampling), our approach (GRESO) reduces rollout overhead by up to 2x while achieving comparable training performance, improving the efficiency of rollout scaling.
Our implementation is based on volcengine/verl .
conda create -n greso python==3.11
conda activate greso
pip3 install -e .
pip3 install vllm==0.8.2
pip install tensordict==0.6.0
pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install wandb IPython matplotlib ipdb latex2sympy2-extended math-verify torchdata pylatexenc
You can download the dataset using the following command:
# cd the project folder
conda activate greso
export PYTHONPATH="$PYTHONPATH:$(pwd)"
bash train-scripts/generate_dataset.sh
Train Qwen Math 1.5b with GRESO on 4xH100:
bash train-scripts/math_qwen_1_5b_dm_greso.sh
Train Qwen Math 7b with GRESO on 8xH100:
bash train-scripts/math_qwen_7b_dm_greso.sh
See more scripts in train-scripts
folder.