-
Notifications
You must be signed in to change notification settings - Fork 12.6k
Fix GLM 4.5
warmup bug
#15088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GLM 4.5
warmup bug
#15088
Conversation
Nvm, complete brainfart on my end. |
I only checked |
This comment was marked as resolved.
This comment was marked as resolved.
@CISC if you want to make a branch I can test kimi-k2, deepseek-v3-0324, and qwen3-coder. |
The only other line I can see is here: but this is for model loading, so shouldn't make any difference. AFAIK, all the other MoE models are using the value of struct llm_graph_context {
const llm_arch arch;
const llama_hparams & hparams;
const llama_cparams & cparams;
const llama_ubatch & ubatch;
const int64_t n_embd;
const int64_t n_layer;
const int64_t n_rot;
const int64_t n_ctx; // user-specified context size (can be different from n_ctx_train)
const int64_t n_head;
const int64_t n_head_kv;
const int64_t n_embd_head_k;
const int64_t n_embd_k_gqa;
const int64_t n_embd_head_v;
const int64_t n_embd_v_gqa;
const int64_t n_expert;
const int64_t n_expert_used;
... which is set in the constructor based on the state of |
It could be the confusion about the term "shadowing": https://en.wikipedia.org/wiki/Variable_shadowing
The key thing is the subclasses of n_expert_used (cparams.warmup ? hparams.n_expert : hparams.n_expert_used) I may well be wrong though as you know far more about the codebase than me! I only traced it back as far as here and didn't go as far as seeing when this constructor is called, etc. |
Oh, LOL, sorry, for some reason I was looking at the diff the wrong way. |
This fixes the warmup bug reported by @createthis here:
#14939 (comment)
It was caused by these local variables shadowing those assigned here during warmup: