length penalty in reward function #14

paulinebourigault · 2025-04-25T17:34:18Z

The length penalty is calculated using the formula: ((5 + length) / 6) ^ alpha

This is based on the formula used in Google Neural Machine Translation paper [https://arxiv.org/pdf/1609.08144]
When alpha < 0, shorter sequences get higher rewards
When alpha > 0, longer sequences get higher rewards

Length penalties are applied to the final reward scores, after the primary reward calculation

paulinebourigault · 2025-04-25T17:38:41Z

example use in training script:
reward_model.length_penalty.enabled=True \
reward_model.length_penalty.alpha=-0.2 \
reward_model.length_penalty.min_length=20 \
reward_model.length_penalty.max_length=null \

length penaly in reward function

bb573bc

paulinebourigault changed the title ~~length penaly in reward function~~ length penalty in reward function Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

length penalty in reward function #14

length penalty in reward function #14

paulinebourigault commented Apr 25, 2025

paulinebourigault commented Apr 25, 2025

length penalty in reward function #14

Are you sure you want to change the base?

length penalty in reward function #14

Conversation

paulinebourigault commented Apr 25, 2025

paulinebourigault commented Apr 25, 2025