Fix `GLM 4.5` warmup bug #15088

jukofyork · 2025-08-05T12:45:55Z

This fixes the warmup bug reported by @createthis here:

It was caused by these local variables shadowing those assigned here during warmup:

llm_graph_context::llm_graph_context(const llm_graph_params & params) :
    arch             (params.arch),
    hparams          (params.hparams),
    cparams          (params.cparams),
    ubatch           (params.ubatch),
    n_embd           (hparams.n_embd),
    n_layer          (hparams.n_layer),
    n_rot            (hparams.n_rot),
    n_ctx            (cparams.n_ctx),
    n_head           (hparams.n_head()),
    n_head_kv        (hparams.n_head_kv()),
    n_embd_head_k    (hparams.n_embd_head_k),
    n_embd_k_gqa     (hparams.n_embd_k_gqa()),
    n_embd_head_v    (hparams.n_embd_head_v),
    n_embd_v_gqa     (hparams.n_embd_v_gqa()),
    n_expert         (hparams.n_expert),
    n_expert_used    (cparams.warmup ? hparams.n_expert : hparams.n_expert_used),
    freq_base        (cparams.rope_freq_base),
    freq_scale       (cparams.rope_freq_scale),
    ext_factor       (cparams.yarn_ext_factor),
    attn_factor      (cparams.yarn_attn_factor),
    beta_fast        (cparams.yarn_beta_fast),
    beta_slow        (cparams.yarn_beta_slow),
    norm_eps         (hparams.f_norm_eps),
    norm_rms_eps     (hparams.f_norm_rms_eps),
    n_tokens         (ubatch.n_tokens),
    n_outputs        (params.n_outputs),
    n_ctx_orig       (cparams.n_ctx_orig_yarn),
    pooling_type     (cparams.pooling_type),
    rope_type        (hparams.rope_type),
    sched            (params.sched),
    backend_cpu      (params.backend_cpu),
    cvec             (params.cvec),
    loras            (params.loras),
    mctx             (params.mctx),
    cross            (params.cross),
    cb_func          (params.cb),
    res              (params.res),
    ctx0             (res->get_ctx()),
    gf               (res->get_gf()) {
        res->set_params(params);
    }

CISC · 2025-08-05T13:51:13Z

~~Uhm, I think this affects all models...~~

Nvm, complete brainfart on my end.

jukofyork · 2025-08-05T13:55:43Z

Uhm, I think this affects all models...

I only checked deepseek2 and it wasn't shadowing these, but it's possible other MoE models are (I've never noticed it when running qwen3 and kimi-k2 MoE models, so assume they are OK).

createthis · 2025-08-05T14:05:42Z

@CISC if you want to make a branch I can test kimi-k2, deepseek-v3-0324, and qwen3-coder.

jukofyork · 2025-08-05T14:26:33Z

The only other line I can see is here:

https://github.com/jukofyork/llama.cpp/blob/11b4186f1af00b3709d92f0cbb49e56481ec03b9/src/llama-model.cpp#L4464

but this is for model loading, so shouldn't make any difference.

AFAIK, all the other MoE models are using the value of n_expert_used inherited from the llm_graph_context member:

struct llm_graph_context {
    const llm_arch arch;

    const llama_hparams & hparams;
    const llama_cparams & cparams;
    const llama_ubatch  & ubatch;

    const int64_t n_embd;
    const int64_t n_layer;
    const int64_t n_rot;
    const int64_t n_ctx;       // user-specified context size (can be different from n_ctx_train)
    const int64_t n_head;
    const int64_t n_head_kv;
    const int64_t n_embd_head_k;
    const int64_t n_embd_k_gqa;
    const int64_t n_embd_head_v;
    const int64_t n_embd_v_gqa;
    const int64_t n_expert;
    const int64_t n_expert_used;
...

which is set in the constructor based on the state of cparams.warmup.

jukofyork · 2025-08-05T14:37:48Z

It could be the confusion about the term "shadowing":

https://en.wikipedia.org/wiki/Variable_shadowing

This outer variable is said to be shadowed by the inner variable, while the inner identifier is said to mask the outer identifier. This can lead to confusion, as it may be unclear which variable subsequent uses of the shadowed variable name refer to

The key thing is the subclasses of llm_graph_context shouldn't be creating a local variable with a copy of n_expert_used taken from the model file as this bypasses the inherited member that is created here on construction:

n_expert_used    (cparams.warmup ? hparams.n_expert : hparams.n_expert_used)

I may well be wrong though as you know far more about the codebase than me! I only traced it back as far as here and didn't go as far as seeing when this constructor is called, etc.

CISC · 2025-08-05T20:32:00Z

Oh, LOL, sorry, for some reason I was looking at the diff the wrong way.

Fix glm4moe bug

11b4186

jukofyork mentioned this pull request Aug 5, 2025

model: Add support for GLM 4.5 family of models (#14921) #14939

Merged

slaren approved these changes Aug 5, 2025

View reviewed changes

jukofyork requested a review from CISC August 5, 2025 12:55

jukofyork merged commit c81de6e into ggml-org:master Aug 5, 2025
45 of 47 checks passed

jukofyork removed the request for review from CISC August 5, 2025 12:56

jukofyork deleted the fix-glm4moe-shadow-bug branch August 5, 2025 12:58

ubergarm mentioned this pull request Aug 5, 2025

Add support for GLM-4.5 models ikawrakow/ik_llama.cpp#668

Merged

4 tasks

This comment was marked as resolved.

Sign in to view

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 5, 2025

Fix glm4moe bug (ggml-org#15088)

17d3714

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `GLM 4.5` warmup bug #15088

Fix `GLM 4.5` warmup bug #15088

Uh oh!

jukofyork commented Aug 5, 2025

Uh oh!

Uh oh!

CISC commented Aug 5, 2025 •

edited

Loading

Uh oh!

jukofyork commented Aug 5, 2025

Uh oh!

This comment was marked as resolved.

createthis commented Aug 5, 2025

Uh oh!

jukofyork commented Aug 5, 2025

Uh oh!

jukofyork commented Aug 5, 2025

Uh oh!

CISC commented Aug 5, 2025

Uh oh!

Uh oh!

Fix GLM 4.5 warmup bug #15088

Fix GLM 4.5 warmup bug #15088

Uh oh!

Conversation

jukofyork commented Aug 5, 2025

Uh oh!

Uh oh!

CISC commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jukofyork commented Aug 5, 2025

Uh oh!

This comment was marked as resolved.

createthis commented Aug 5, 2025

Uh oh!

jukofyork commented Aug 5, 2025

Uh oh!

jukofyork commented Aug 5, 2025

Uh oh!

CISC commented Aug 5, 2025

Uh oh!

Uh oh!

Fix `GLM 4.5` warmup bug #15088

Fix `GLM 4.5` warmup bug #15088

CISC commented Aug 5, 2025 •

edited

Loading