-
Notifications
You must be signed in to change notification settings - Fork 12
bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21
bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent #21
Conversation
* In a more recent version of Transformers, inv_freq buffer is no longer persistent huggingface/transformers@95f96b4 * This causes a crash when someone tries to load a (converted) Llama checkpoint in NeMo (ie.: CodeLlama 7b)
Hi, Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash:
Because HF is not serializing |
… -Adarsh
From: Eduardo Vellasques ***@***.***>
Reply-To: aws-neuron/neuronx-nemo-megatron ***@***.***>
Date: Thursday, April 4, 2024 at 11:41 AM
To: aws-neuron/neuronx-nemo-megatron ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [aws-neuron/neuronx-nemo-megatron] bugfix: inv_freq buffer in Llama RotaryEmbedding shouldn't be persistent (PR #21)
Hi,
Does anyone have an update on this? Basically, loading any of the recent HF checkpoints that rely on rotary position embeddings (ie.: Mistral, Llama), will result in a crash:
Missing key(s) in state_dict: "model.language_model.encoder.layers.0.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.1.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.2.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.3.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.4.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.5.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.6.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.7.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.8.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.9.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.10.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.11.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.12.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.13.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.14.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.15.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.16.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.17.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.18.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.19.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.20.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.21.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.22.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.23.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.24.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.25.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.26.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.27.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.28.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.29.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.30.self_attention.core_attention.rotary_emb.inv_freq", "model.language_model.encoder.layers.31.self_attention.core_attention.rotary_emb.inv_freq".
Because HF is not serializing inv_freq.
—
Reply to this email directly, view it on GitHub<#21 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AWRRYCJKTPS6WHOQUA247GTY3WNERAVCNFSM6AAAAABFDUQZXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXHEZTKOBQGM>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Can you resolve the merge conflicts? Then we can merge it |
Quick question, solving the conflict will involve replacing nemo/nemo/collections/nlp/modules/common/megatron/llama_module.py with nemo/nemo/collections/nlp/modules/common/megatron/falcon_module.py. I opened an issue about a bug in the Llama conversion scripts (basically they should aggregate |
I noticed that in a more recent release you guys handle that during checkpoint loading. I created another PR for the issue with checkpoint conversion (#26) so I'll close this one. |
Main issue was solved by another PR. |
*Issue #, if available: N/A
Description of changes: Switches off persistent flag when registering inv_freq buffer
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.