Skip to content

Conversation

linxxx3
Copy link
Contributor

@linxxx3 linxxx3 commented Sep 25, 2025

What does this PR do?

Stage 1 PR for RFC #3434

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

There are some unit tests in recipe/agent_lightning_like/test.

Also a gsm8k tool agent example is here for e2e test: recipe/agent_lightning_like/example/run_qwen2.5_7b.sh

local training curves:

image

API and Usage Example

see recipe/agent_lightning_like/example/README.md

Design & Code Changes

see RFC #3434

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new recipe for an agent-lightning like RL training pipeline. The changes are extensive, adding a complete workflow from data handling, agent interaction, to training. My review focuses on correctness and robustness. I've identified several critical issues, including potential race conditions, resource leaks, and runtime errors due to incorrect exception handling and unsafe coding patterns. I've also pointed out a significant performance issue in the example agent client. Addressing these points will greatly improve the stability and reliability of the new training pipeline.

@linxxx3 linxxx3 force-pushed the recipe_agent_lightning_like branch from 660111a to cc214e2 Compare September 25, 2025 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant