Revert "[trainer] fix: address serialization issues when using async reward function and ray ppo trainer" #3819

vermouth1992 · 2025-10-18T23:40:11Z

Reverts #3769

…reward f…" This reverts commit f0539a5.

gemini-code-assist

Code Review

This pull request reverts a previous fix for a serialization issue with the asynchronous reward function. While the original fix may have had drawbacks, this revert re-introduces a critical bug that can cause training to crash under certain configurations. Specifically, if asynchronous reward computation is enabled along with sandbox fusion, passing self.reward_fn to a remote Ray function will fail due to it containing a non-serializable multiprocessing.Semaphore.

gemini-code-assist · 2025-10-18T23:46:00Z

verl/trainer/ppo/ray_trainer.py

-                            future_reward = compute_reward_async.remote(
-                                data=batch, config=self.config, tokenizer=self.tokenizer
-                            )
+                            future_reward = compute_reward_async.remote(data=batch, reward_fn=self.reward_fn)


This revert re-introduces a critical serialization issue. The self.reward_fn object can be non-serializable by Ray, especially when sandbox_fusion is enabled, as it may contain a multiprocessing.Semaphore (see verl/trainer/ppo/reward.py:135). Passing a non-serializable object to a remote function will cause the training to crash with a SerializationError.

The previous implementation, although using a deprecated pattern, correctly avoided this issue by reconstructing the reward_fn inside the remote worker. Reverting this fix without addressing the underlying serializability of self.reward_fn is a regression.

A more robust solution would be to ensure reward_fn is always serializable, for example by using ray.util.concurrency.Semaphore instead of multiprocessing.Semaphore. However, in the absence of that fix, the previous code is safer.

Suggested change

future_reward = compute_reward_async.remote(data=batch, reward_fn=self.reward_fn)

future_reward = compute_reward_async.remote(

data=batch, config=self.config, tokenizer=self.tokenizer

)

Revert "[trainer] fix: address serialization issues when using async …

25060e9

…reward f…" This reverts commit f0539a5.

vermouth1992 requested review from PeterSH6, eric-haibin-lin and tongyx361 as code owners October 18, 2025 23:40

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

vermouth1992 closed this Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[trainer] fix: address serialization issues when using async reward function and ray ppo trainer" #3819

Revert "[trainer] fix: address serialization issues when using async reward function and ray ppo trainer" #3819

Uh oh!

vermouth1992 commented Oct 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Revert "[trainer] fix: address serialization issues when using async reward function and ray ppo trainer" #3819

Revert "[trainer] fix: address serialization issues when using async reward function and ray ppo trainer" #3819

Uh oh!

Conversation

vermouth1992 commented Oct 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant