Skip to content

Fix GLM 4.5 warmup bug #15088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 5, 2025
Merged

Conversation

jukofyork
Copy link
Collaborator

This fixes the warmup bug reported by @createthis here:

#14939 (comment)

It was caused by these local variables shadowing those assigned here during warmup:

llm_graph_context::llm_graph_context(const llm_graph_params & params) :
    arch             (params.arch),
    hparams          (params.hparams),
    cparams          (params.cparams),
    ubatch           (params.ubatch),
    n_embd           (hparams.n_embd),
    n_layer          (hparams.n_layer),
    n_rot            (hparams.n_rot),
    n_ctx            (cparams.n_ctx),
    n_head           (hparams.n_head()),
    n_head_kv        (hparams.n_head_kv()),
    n_embd_head_k    (hparams.n_embd_head_k),
    n_embd_k_gqa     (hparams.n_embd_k_gqa()),
    n_embd_head_v    (hparams.n_embd_head_v),
    n_embd_v_gqa     (hparams.n_embd_v_gqa()),
    n_expert         (hparams.n_expert),
    n_expert_used    (cparams.warmup ? hparams.n_expert : hparams.n_expert_used),
    freq_base        (cparams.rope_freq_base),
    freq_scale       (cparams.rope_freq_scale),
    ext_factor       (cparams.yarn_ext_factor),
    attn_factor      (cparams.yarn_attn_factor),
    beta_fast        (cparams.yarn_beta_fast),
    beta_slow        (cparams.yarn_beta_slow),
    norm_eps         (hparams.f_norm_eps),
    norm_rms_eps     (hparams.f_norm_rms_eps),
    n_tokens         (ubatch.n_tokens),
    n_outputs        (params.n_outputs),
    n_ctx_orig       (cparams.n_ctx_orig_yarn),
    pooling_type     (cparams.pooling_type),
    rope_type        (hparams.rope_type),
    sched            (params.sched),
    backend_cpu      (params.backend_cpu),
    cvec             (params.cvec),
    loras            (params.loras),
    mctx             (params.mctx),
    cross            (params.cross),
    cb_func          (params.cb),
    res              (params.res),
    ctx0             (res->get_ctx()),
    gf               (res->get_gf()) {
        res->set_params(params);
    }

@jukofyork jukofyork requested a review from CISC August 5, 2025 12:55
@jukofyork jukofyork merged commit c81de6e into ggml-org:master Aug 5, 2025
45 of 47 checks passed
@jukofyork jukofyork removed the request for review from CISC August 5, 2025 12:56
@jukofyork jukofyork deleted the fix-glm4moe-shadow-bug branch August 5, 2025 12:58
@CISC
Copy link
Collaborator

CISC commented Aug 5, 2025

Uhm, I think this affects all models...

Nvm, complete brainfart on my end.

@jukofyork
Copy link
Collaborator Author

Uhm, I think this affects all models...

I only checked deepseek2 and it wasn't shadowing these, but it's possible other MoE models are (I've never noticed it when running qwen3 and kimi-k2 MoE models, so assume they are OK).

@CISC

This comment was marked as resolved.

@createthis
Copy link

@CISC if you want to make a branch I can test kimi-k2, deepseek-v3-0324, and qwen3-coder.

@jukofyork
Copy link
Collaborator Author

The only other line I can see is here:

https://github.com/jukofyork/llama.cpp/blob/11b4186f1af00b3709d92f0cbb49e56481ec03b9/src/llama-model.cpp#L4464

but this is for model loading, so shouldn't make any difference.

AFAIK, all the other MoE models are using the value of n_expert_used inherited from the llm_graph_context member:

struct llm_graph_context {
    const llm_arch arch;

    const llama_hparams & hparams;
    const llama_cparams & cparams;
    const llama_ubatch  & ubatch;

    const int64_t n_embd;
    const int64_t n_layer;
    const int64_t n_rot;
    const int64_t n_ctx;       // user-specified context size (can be different from n_ctx_train)
    const int64_t n_head;
    const int64_t n_head_kv;
    const int64_t n_embd_head_k;
    const int64_t n_embd_k_gqa;
    const int64_t n_embd_head_v;
    const int64_t n_embd_v_gqa;
    const int64_t n_expert;
    const int64_t n_expert_used;
...

which is set in the constructor based on the state of cparams.warmup.

@jukofyork
Copy link
Collaborator Author

It could be the confusion about the term "shadowing":

https://en.wikipedia.org/wiki/Variable_shadowing

This outer variable is said to be shadowed by the inner variable, while the inner identifier is said to mask the outer identifier. This can lead to confusion, as it may be unclear which variable subsequent uses of the shadowed variable name refer to

The key thing is the subclasses of llm_graph_context shouldn't be creating a local variable with a copy of n_expert_used taken from the model file as this bypasses the inherited member that is created here on construction:

n_expert_used    (cparams.warmup ? hparams.n_expert : hparams.n_expert_used)

I may well be wrong though as you know far more about the codebase than me! I only traced it back as far as here and didn't go as far as seeing when this constructor is called, etc.

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 5, 2025
@CISC
Copy link
Collaborator

CISC commented Aug 5, 2025

Oh, LOL, sorry, for some reason I was looking at the diff the wrong way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants