bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

evellasques · 2024-03-22T16:38:44Z

In a more recent version of Transformers, inv_freq buffer is no longer persistent huggingface/transformers@95f96b4
This causes a crash when someone tries to load a (converted) Llama checkpoint in NeMo (ie.: CodeLlama 7b)

*Issue #, if available: N/A

Description of changes: Switches off persistent flag when registering inv_freq buffer

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

* In a more recent version of Transformers, inv_freq buffer is no longer persistent huggingface/transformers@95f96b4 * This causes a crash when someone tries to load a (converted) Llama checkpoint in NeMo (ie.: CodeLlama 7b)

evellasques · 2024-04-04T18:40:49Z

Hi,

Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash:

 Missing key(s) in state_dict: "model.language_model.encoder.layers.0.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.1.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.2.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.3.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.4.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.5.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.6.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.7.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.8.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.9.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.10.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.11.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.12.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.13.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.14.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.15.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.16.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.17.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.18.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.19.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.20.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.21.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.22.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.23.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.24.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.25.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.26.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.27.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.28.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.29.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.30.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.31.self_attention.core_attention.rotary_emb.inv_freq".

Because HF is not serializing inv_freq.

aws-singhada · 2024-04-04T18:43:24Z

Hi @mamidala, ***@***.***> , @king, ***@***.***>, Can you please take a look?

…

-Adarsh From: Eduardo Vellasques ***@***.***> Reply-To: aws-neuron/neuronx-nemo-megatron ***@***.***> Date: Thursday, April 4, 2024 at 11:41 AM To: aws-neuron/neuronx-nemo-megatron ***@***.***> Cc: Subscribed ***@***.***> Subject: Re: [aws-neuron/neuronx-nemo-megatron] bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent (PR #21) Hi, Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash: Missing key(s) in state_dict: "model.language_model.encoder.layers.0.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.1.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.2.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.3.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.4.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.5.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.6.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.7.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.8.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.9.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.10.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.11.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.12.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.13.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.14.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.15.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.16.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.17.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.18.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.19.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.20.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.21.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.22.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.23.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.24.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.25.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.26.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.27.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.28.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.29.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.30.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.31.self_attention.core_attention.rotary_emb.inv_freq". Because HF is not serializing inv_freq. — Reply to this email directly, view it on GitHub<#21 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AWRRYCJKTPS6WHOQUA247GTY3WNERAVCNFSM6AAAAABFDUQZXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXHEZTKOBQGM>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

aws-kingrj · 2024-04-04T18:48:18Z

Can you resolve the merge conflicts? Then we can merge it

evellasques · 2024-04-04T21:53:20Z

Can you resolve the merge conflicts? Then we can merge it

Quick question, solving the conflict will involve replacing nemo/nemo/collections/nlp/modules/common/megatron/llama_module.py with nemo/nemo/collections/nlp/modules/common/megatron/falcon_module.py. I opened an issue about a bug in the Llama conversion scripts (basically they should aggregate gate_proj and up_proj weights from HuggingFace Llama into a single dense_h_to_4h (for Swiglu). Do you want me to also fix that and add it as part of this PR?

evellasques · 2024-04-17T14:50:05Z

Can you resolve the merge conflicts? Then we can merge it

I noticed that in a more recent release you guys handle that during checkpoint loading. I created another PR for the issue with checkpoint conversion (#26) so I'll close this one.

evellasques · 2024-04-17T14:50:58Z

Main issue was solved by another PR.

evellasques requested review from aws-maens and musunita as code owners March 22, 2024 16:38

evellasques closed this Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

Uh oh!

evellasques commented Mar 22, 2024

Uh oh!

evellasques commented Apr 4, 2024

Uh oh!

aws-singhada commented Apr 4, 2024 via email

Uh oh!

aws-kingrj commented Apr 4, 2024

Uh oh!

evellasques commented Apr 4, 2024 •

edited

Loading

Uh oh!

evellasques commented Apr 17, 2024

Uh oh!

evellasques commented Apr 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21

Uh oh!

Conversation

evellasques commented Mar 22, 2024

Uh oh!

evellasques commented Apr 4, 2024

Uh oh!

aws-singhada commented Apr 4, 2024 via email

Uh oh!

aws-kingrj commented Apr 4, 2024

Uh oh!

evellasques commented Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evellasques commented Apr 17, 2024

Uh oh!

evellasques commented Apr 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

evellasques commented Apr 4, 2024 •

edited

Loading