v0.4.0 release: large MoEs, tool calling, and low resource friendly #1903
eric-haibin-lin
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Known issues
Highlights
Large MoE models support: DeepSeek 671b & Qwen3 235b
Preview features are provided to enable large MoE RL training with Megatron backend, such as DeepSeek 671b documentation. The Megatron backend now supports:
Tool-calling, multi-turn RL, SGLang rollout
Sample-level rollout with tool calling and multi-turn RL is supported via SGLang. We provide the Search-R1 recipe built on top of that.
A prototype for sample-level async tool calling is also available with vllm AsyncLLM server.
Multiple enhancements and improvements are made to SGLang rollout, supporting multi-node and multimodal.
Sandbox fusion is integrated.
Low resource friendly
LoRA support is available, enabling 70B+ models on a single node with A100x8 GPUs.
Fused cross entropy kernel to drastically reduce peak memory:
actor_rollout_ref.model.use_fused_kernels=True
New models, algorithms and recipes
New models and training utils include:
FSDP2 and training optimizations
FSDP2 is recommended to replace FSDP1, providing better throughput and memory usage, and is composable with other features (e.g. torch.compile):
Furthermore, FSDP2 cpu offloading is compatible with gradient accumulation. You can turn it on to save memory with
actor_rollout_ref.actor.offload_policy=True
.Other optimizations include:
Deployment and hardware
Breaking changes and deprecations
+{config}={value}
. Please try removing the + to fix such errors.trainer.val_before_train
data.{prompt,response}_dict_keys
verl.utils.reward_score._default_compute_score
is deprecated. Useverl.utils.reward_score.default_compute_score
instead.New Contributors
@zhao9797 @frederrx @dingyuan-shi @SwordFaith @CJReinforce @linjc16 @wkcn @hijkzzz @JustinTong0323 @mertunsall @Altair-Alpha @czczup @SparkJiao @sunjin-k @tsaoyu @XueruiSu @zhaochenyang20 @NascentAscension @corgilee @lei-lei @pengsun @silverriver @mingruimingrui @Ann-Qin @lilei199908 @YeonwooSung @himalalps @tao-githup @as12138 @thibautbar @aoshen524 @MantasBaksys @YangWang92 @patrik-bartak @mansicer @wangfuchun-fc @survivi @RainBowLuoCS @gzpan @HuaizhengZhang @HollowMan6 @zTonyZhao @lxg2015 @estsauver @jhinpan @yhyang201 @qingquansong @chenhaiq @ShareLer @Artessay @Jackory @swtheing @U-rara @Andrewzh112 @mansoor-s @Necolizer @llkn-2 @yuyuz @linxxx3 @gaokaiz2 @ccchow @ezyang @zw0610 @pavelgein @plutoZZZZ @jybsuper @hebiao064 @GaotangLi @zhangyongxin121 @spacegoing @cedricbeta @Geaming2002 @imh966 @zyzshishui @zzong2006 @langfengQ @zheliuyu @casper-hansen @Bihan @czx6858 @GHGmc2 @DtYXs @thelongestusernameofall @xichengpro @Irvingwangjr @shinytang6 @qyhfrank @mlpod @popomen @liyc-ai @leo-pony @LiuXTao @Lins-01 @yzlnew @vllbc @ZDJeffrey @sukrucildirr @Moyu-42 @YRdddream @jdf-prog @HUGHNew @ElliottYan @NileZhou @shizhediao @rj42 @Crispig @omahs @CurryRice233 @china10s
Thank you for your first contributions!
Full Changelog: v0.3.0.post1...v0.4.0
Beta Was this translation helpful? Give feedback.
All reactions