Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1999

shuhuayu · 2025-11-07T00:23:15Z

Rebased on main to merge this pr: #1964

Add checks for weight tying in state_dict processing

Co-authored-by: Shuhua Yu <[email protected]>

shuhuayu · 2025-11-07T01:03:30Z

Added a test training from qwen3 4b huggingface checkpoint, which omits lm_head.weight.

tianyu-l · 2025-11-07T01:33:12Z

torchtitan/models/qwen3/model/state_dict_adapter.py

+            self.model_args.enable_weight_tying
+            and "lm_head.weight" not in hf_state_dict
+        ):
+            if "model.embed_tokens.weight" in hf_state_dict:


why this if? shouldn't we assert the existence of embedding?

My guess is that this if was copied from somewhere PP can be enabled, so embedding is on some ranks but not others. But with PP, we'd also require embedding and lm_head to be on the same rank -- o/w how would you be able to load the lm_head weights?

Thx for this point, added an assertion here.

tianyu-l

sgtm

…pytorch#1999) Rebased on main to merge this pr: pytorch#1964 --------- Co-authored-by: William <[email protected]> Co-authored-by: Achazwl <[email protected]>

Achazwl and others added 5 commits November 6, 2025 16:11

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3

2dffb97

Add checks for weight tying in state_dict processing

Update torchtitan/models/qwen3/model/state_dict_adapter.py

08a49f8

Co-authored-by: Shuhua Yu <[email protected]>

Update torchtitan/models/qwen3/model/state_dict_adapter.py

8dd2fa7

Co-authored-by: Shuhua Yu <[email protected]>

Remove the shadow copy of hf_state_dict

5ebd0c8

fix lint

07c7e00

shuhuayu requested review from fegin, tianyu-l, wconstab and wwwjn as code owners November 7, 2025 00:23

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 7, 2025

tianyu-l reviewed Nov 7, 2025

View reviewed changes

add embed_tokens assertion

3337c22

tianyu-l approved these changes Nov 7, 2025

View reviewed changes

tianyu-l merged commit 5ecc871 into main Nov 7, 2025
9 checks passed

tianyu-l mentioned this pull request Nov 7, 2025

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1999

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1999

shuhuayu commented Nov 7, 2025

Uh oh!

shuhuayu commented Nov 7, 2025

Uh oh!

tianyu-l Nov 7, 2025

Uh oh!

shuhuayu Nov 7, 2025

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1999

Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1999

Conversation

shuhuayu commented Nov 7, 2025

Uh oh!

shuhuayu commented Nov 7, 2025

Uh oh!

tianyu-l Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

shuhuayu Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants