Official code repository of One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning
One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning
Ritesh Goru, Shanay Mehta, Prateek Jain
Paper: https://arxiv.org/abs/2504.18246
Dataset: https://huggingface.co/datasets/devrev-research/MathChatSync-reasoning
Run bash setup.sh
.
The script will:
- Clone the appropriate version of LLamaFactory
- Apply our modifications
Use the special_mask_for_reasoning
flag to train your models in a single pass!
@article{goru2025efficientsinglepasstrainingmultiturn,
title={Efficient Single-Pass Training for Multi-Turn Reasoning},
author={Ritesh Goru and Shanay Mehta and Prateek Jain},
year={2025},
eprint={2504.18246},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.18246},
}