memory : handle kv_unified for hybrid models #15050

compilade · 2025-08-03T05:05:45Z

Follow-up from #14725, which didn't really fix the underlying problem of not considering cparams.kv_unified.

Since #14959, inference with hybrid models has been broken (except when using -kvu), due to hybrid memory not passing cparams.kv_unified properly.

Reproduction of the problem: attempt to run llama-perplexity with any hybrid model.

$ ./bin/llama-perplexity -f /workspace/wikitext-2-raw/wiki.test.raw -m /workspace/gguf/LFM2-350M-BF16.gguf --chunks 10

On master, this fails with an assertion

/workspace/llama.cpp/ggml/src/ggml.c:3740: GGML_ASSERT(mask->ne[1] >= a->ne[1]) failed

With this PR, this is no longer a problem. I've tested this with https://huggingface.co/LiquidAI/LFM2-350M.

Make sure to read the contributing guidelines before submitting a PR

memory : handle kv_unified for hybrid models

e549515

compilade requested a review from ggerganov August 3, 2025 05:05

compilade added the bugfix fixes an issue or bug label Aug 3, 2025

ggerganov approved these changes Aug 3, 2025

View reviewed changes

compilade mentioned this pull request Aug 3, 2025

imatrix : fix 3d activation handling for hybrid and recurrent models #14994

Merged

CISC merged commit 11a3811 into master Aug 3, 2025
45 of 47 checks passed

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Aug 5, 2025

memory : handle kv_unified for hybrid models (ggml-org#15050)

c0c8da1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory : handle kv_unified for hybrid models #15050

memory : handle kv_unified for hybrid models #15050

Uh oh!

compilade commented Aug 3, 2025

Uh oh!

Uh oh!

Uh oh!

memory : handle kv_unified for hybrid models #15050

memory : handle kv_unified for hybrid models #15050

Uh oh!

Conversation

compilade commented Aug 3, 2025

Uh oh!

Uh oh!

Uh oh!