Skip to content

CI fails for experimental tests: TypeError: GRPOWithReplayBufferTrainer.update_with_replay_buffer() got an unexpected keyword argument 'prompt_inputs' #4365

@albertvillanova

Description

@albertvillanova

CI fails for experimental tests: https://github.com/huggingface/trl/actions/runs/18899784148/job/53944414414?pr=4354

TypeError: GRPOWithReplayBufferTrainer.update_with_replay_buffer() got an unexpected keyword argument 'prompt_inputs'. Did you mean 'prompt_ids'?

FAILED tests/experimental/test_grpo_with_replay_buffer_trainer.py::TestUpdateWithReplayBuffer::test_update_with_replay_buffer_no_variance - TypeError: GRPOWithReplayBufferTrainer.update_with_replay_buffer() got an unexpected keyword argument 'prompt_inputs'. Did you mean 'prompt_ids'?
FAILED tests/experimental/test_grpo_with_replay_buffer_trainer.py::TestUpdateWithReplayBuffer::test_update_with_replay_buffer_with_variance - TypeError: GRPOWithReplayBufferTrainer.update_with_replay_buffer() got an unexpected keyword argument 'prompt_inputs'. Did you mean 'prompt_ids'?
FAILED tests/experimental/test_grpo_with_replay_buffer_trainer.py::TestUpdateWithReplayBuffer::test_update_with_mixed_variance - TypeError: GRPOWithReplayBufferTrainer.update_with_replay_buffer() got an unexpected keyword argument 'prompt_inputs'. Did you mean 'prompt_ids'?
FAILED tests/experimental/test_grpo_with_replay_buffer_trainer.py::TestUpdateWithReplayBuffer::test_update_with_inputs_different_seq_len - TypeError: GRPOWithReplayBufferTrainer.update_with_replay_buffer() got an unexpected keyword argument 'prompt_inputs'. Did you mean 'prompt_ids'?

Stacktrace:

_____ TestUpdateWithReplayBuffer.test_update_with_inputs_different_seq_len _____

self = <tests.experimental.test_grpo_with_replay_buffer_trainer.TestUpdateWithReplayBuffer object at 0x7fe30f3cd0f0>

    def test_update_with_inputs_different_seq_len(self):
        """
        Test with inputs where the sequence lengths are different from the prepopulated buffer.
        """
        self._prepopulate_buffer()
        pad_token_id = self.trainer.processing_class.pad_token_id
        group_advantages = torch.tensor([[0.6, 0.6], [0.3, 0.45]])  # one no-variance, one variance
        inputs = {
            "group_advantages": group_advantages,
            "prompt_ids": torch.tensor(
                [
                    [1, 2, pad_token_id],
                    [1, 2, pad_token_id],
                    [3, 4, 5],
                    [3, 4, 5],
                ]
            ),
            "prompt_mask": torch.tensor([[1, 1, 0], [1, 1, 0], [1, 1, 1], [1, 1, 1]], dtype=torch.long),
            "completion_ids": torch.tensor(
                [
                    [1009, 1010, pad_token_id],
                    [1011, 1012, 1013],
                    [1013, 1014, pad_token_id],
                    [1015, 1016, 1017],
                ]
            ),
            "completion_mask": torch.tensor([[1, 1, 0], [1, 1, 1], [1, 1, 0], [1, 1, 1]], dtype=torch.long),
            "prompt_inputs": {},
        }
        inputs["group_std_rewards"] = group_advantages.std(dim=1).expand_as(group_advantages)
    
>       outputs_after_sampling = self.trainer.update_with_replay_buffer(**inputs, num_items_in_batch=4)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       TypeError: GRPOWithReplayBufferTrainer.update_with_replay_buffer() got an unexpected keyword argument 'prompt_inputs'. Did you mean 'prompt_ids'?

tests/experimental/test_grpo_with_replay_buffer_trainer.py:210: TypeError

Metadata

Metadata

Labels

🏋 GRPORelated to GRPO🐛 bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions