Skip to content

Conversation

KabakaWilliam
Copy link

What does this PR do?

Adds a new recipe to perform GRPO with full weight-updates on 1.5B models.

Checklist Before Starting

Test

image > Pass@1 for MATH500: 77.8

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@KabakaWilliam KabakaWilliam changed the title Feat/grpo math500 single gpu [recipe, hardware] feat: Add GRPO with full weight updates for 1.5B models on a single GPU Oct 13, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new recipe for GRPO with full weight updates on 1.5B models. My review identified two critical issues in the new shell script qwen2.5math-1.5b_grpo_1_h100_fsdp_vllm.sh where missing spaces before line continuation characters would cause the script to fail due to malformed arguments. Additionally, I've noted that the pull request template file has been incorrectly modified with PR-specific details and should be reverted.

Copy link
Author

@KabakaWilliam KabakaWilliam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed continuation character from end of script and ensured proper spacing, reverted the Pull request templat eback to it's normal state

Re-uploaded the pull request template to be the default, and not specific to my request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant