-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Feat/log fit sub reward #3710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/log fit sub reward #3710
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a logging mechanism for extra sub-rewards during the training phase. A new static method _summarize_reward_extras
is introduced to process and summarize this information into metrics. The changes are well-integrated into both synchronous and asynchronous reward computation paths.
My review focuses on the implementation of the new _summarize_reward_extras
method. I've identified a potential issue where the data flattening and conversion logic could silently ignore some data formats (like multi-element tensors/arrays or deeply nested lists), leading to incorrect metrics. I've provided a suggestion to make this logic more robust.
flattened: list = [] | ||
for value in values: | ||
if isinstance(value, list | tuple): | ||
flattened.extend(value) | ||
else: | ||
flattened.append(value) | ||
|
||
numeric_vals: list[float] = [] | ||
for value in flattened: | ||
scalar: float | None = None | ||
if isinstance(value, torch.Tensor): | ||
if value.numel() == 1: | ||
scalar = float(value.item()) | ||
elif isinstance(value, np.ndarray): | ||
if value.size == 1: | ||
scalar = float(value.item()) | ||
elif isinstance(value, np.floating | np.integer | int | float | bool): | ||
scalar = float(value) | ||
else: | ||
try: | ||
scalar = float(value) | ||
except (TypeError, ValueError): | ||
scalar = None | ||
|
||
if scalar is not None: | ||
numeric_vals.append(scalar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation for processing reward_extra_infos
is not fully robust. It only flattens a single level of nested list
or tuple
objects and silently ignores multi-element torch.Tensor
or np.ndarray
objects. This can lead to incomplete data and misleading metrics.
For example, if a value is np.array([1, 2])
or a nested list like [[1, 2]]
, its contents will not be included in the statistics.
A more robust approach would be to recursively flatten all iterable structures and then attempt to convert all resulting items to floats. This ensures all numeric data is captured, regardless of nesting or type.
# 1. Flatten all nested lists, tuples, tensors, and arrays
flattened: list = []
items_to_flatten = list(values)
while items_to_flatten:
item = items_to_flatten.pop(0)
if isinstance(item, (list, tuple)):
items_to_flatten.extend(item)
elif isinstance(item, (torch.Tensor, np.ndarray)):
items_to_flatten.extend(item.flatten().tolist())
else:
flattened.append(item)
# 2. Convert flattened items to numeric values
numeric_vals: list[float] = []
for value in flattened:
try:
numeric_vals.append(float(value))
except (TypeError, ValueError):
# Ignore values that cannot be converted to float
pass
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}
(This will be checked by the CI){modules}
includefsdp
,megatron
,sglang
,vllm
,rollout
,trainer
,ci
,training_utils
,recipe
,hardware
,deployment
,ray
,worker
,single_controller
,misc
,perf
,model
,algo
,env
,tool
,ckpt
,doc
,data
,
like[megatron, fsdp, doc]
{type}
is infeat
,fix
,refactor
,chore
,test
[BREAKING]
to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batching
Test
API and Usage Example
# Add code snippet or script demonstrating how to use this
Design & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
ci-request
channel in theverl
Slack workspace. (If not accessible, please try the Feishu group (飞书群).)