-
Couldn't load subscription status.
- Fork 68
[Inference Benchmark] Copy llama4 model code from transformers #5388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like mostly mechanical changes.
How are we validating the code change in this PR didn't lead to model change? I suspect the renamed model layers would lead to different behavior in code inside benchmark_inference.py not working as intended.
| NVFP4InferenceLinear, | ||
| nvfuser_f16a_nvfp4weight_scaled_grouped_mm, | ||
| nvfuser_f16a_nvfp4weight_scaled_mm, | ||
| copied_Llama4ForCausalLM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, we could use a better name on this 😜
| ) | ||
| from transformers.models.llama4 import Llama4TextConfig | ||
| from transformers.models.llama4.modeling_llama4 import ( | ||
| Llama4TextMoe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the high level, one of the intention for this benchmark is to replace Llama4TextMoe with the custom module Llama4MoE defined in this file.
We don't have to fix it right now, but we can add a TODO that we'll swap it out later. I think that would allow us to simplify the implementation and avoid some copied_ prefix.
|
|
||
|
|
||
| # Ref: https://github.com/huggingface/transformers/blob/ff8b88a9/src/transformers/models/llama4/modeling_llama4.py#L147-L165 | ||
| class copied_Llama4TextMoe(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this I believe is the MoE that we want to swap.
Here the new model has copied_Llama4TextMoe. If you look at benchmark_inference.py, we have a function that does the swapping _replace_llama4_moe, that wouldn't work with the updated model, since it only mechanically swap Llama4TextMoe
Copied the llama4 model code from transformers so that we can tweak it locally, instead of type-checking + rewrites.
Needed to copy the entire
Llama4ForCausalLMclass, and a few other things, to keep a bunch of optimizations that transformers was running.I prepended all the new class+function names with "
copied_". This is needed since we import the real transformersLlama4TextMoeclass for type checking. Alternatively, we could leave the function names alone, and do something like this for type-checking:from transformers.models.llama4.modeling_llama4 import Llama4TextMoe as Llama4TextMoe_type