]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama: dynamic head_dim and n_rot for SWA (#20301)
authorXuan-Son Nguyen <redacted>
Mon, 9 Mar 2026 21:22:39 +0000 (22:22 +0100)
committerGitHub <redacted>
Mon, 9 Mar 2026 21:22:39 +0000 (22:22 +0100)
commit59db9a357d9a247009c70fda34050661b17a1a5c
treeb097f4e42b68c73af44a6488dd3a8450bc863896
parent23fbfcb1ad6c6f76b230e8895254de785000be46
llama: dynamic head_dim and n_rot for SWA (#20301)

* llama: dynamic head_dim and n_rot for SWA

* also add gguf_writer wrappers

* fix build

* build_rope_shift arg reorder
112 files changed:
gguf-py/gguf/constants.py
gguf-py/gguf/gguf_writer.py
src/llama-arch.cpp
src/llama-arch.h
src/llama-context.cpp
src/llama-graph.cpp
src/llama-hparams.cpp
src/llama-hparams.h
src/llama-kv-cache.cpp
src/llama-kv-cache.h
src/llama-model-loader.cpp
src/llama-model-saver.cpp
src/llama-model.cpp
src/models/afmoe.cpp
src/models/apertus.cpp
src/models/arcee.cpp
src/models/arctic.cpp
src/models/baichuan.cpp
src/models/bailingmoe2.cpp
src/models/bert.cpp
src/models/bitnet.cpp
src/models/bloom.cpp
src/models/chameleon.cpp
src/models/chatglm.cpp
src/models/codeshell.cpp
src/models/cogvlm.cpp
src/models/cohere2-iswa.cpp
src/models/command-r.cpp
src/models/dbrx.cpp
src/models/deci.cpp
src/models/deepseek.cpp
src/models/deepseek2.cpp
src/models/dots1.cpp
src/models/dream.cpp
src/models/ernie4-5-moe.cpp
src/models/ernie4-5.cpp
src/models/eurobert.cpp
src/models/exaone-moe.cpp
src/models/exaone.cpp
src/models/exaone4.cpp
src/models/falcon-h1.cpp
src/models/falcon.cpp
src/models/gemma-embedding.cpp
src/models/gemma.cpp
src/models/gemma2-iswa.cpp
src/models/gemma3.cpp
src/models/gemma3n-iswa.cpp
src/models/glm4-moe.cpp
src/models/glm4.cpp
src/models/gpt2.cpp
src/models/gptneox.cpp
src/models/granite-hybrid.cpp
src/models/granite.cpp
src/models/grok.cpp
src/models/grovemoe.cpp
src/models/hunyuan-dense.cpp
src/models/hunyuan-moe.cpp
src/models/internlm2.cpp
src/models/jais.cpp
src/models/jais2.cpp
src/models/jamba.cpp
src/models/kimi-linear.cpp
src/models/lfm2.cpp
src/models/llada-moe.cpp
src/models/llada.cpp
src/models/llama-iswa.cpp
src/models/llama.cpp
src/models/maincoder.cpp
src/models/minicpm3.cpp
src/models/minimax-m2.cpp
src/models/mistral3.cpp
src/models/modern-bert.cpp
src/models/mpt.cpp
src/models/nemotron-h.cpp
src/models/nemotron.cpp
src/models/neo-bert.cpp
src/models/olmo.cpp
src/models/olmo2.cpp
src/models/olmoe.cpp
src/models/openelm.cpp
src/models/orion.cpp
src/models/paddleocr.cpp
src/models/pangu-embedded.cpp
src/models/phi2.cpp
src/models/phi3.cpp
src/models/plamo.cpp
src/models/plamo2.cpp
src/models/plamo3.cpp
src/models/plm.cpp
src/models/qwen.cpp
src/models/qwen2.cpp
src/models/qwen2moe.cpp
src/models/qwen2vl.cpp
src/models/qwen3.cpp
src/models/qwen35.cpp
src/models/qwen35moe.cpp
src/models/qwen3moe.cpp
src/models/qwen3next.cpp
src/models/qwen3vl-moe.cpp
src/models/qwen3vl.cpp
src/models/refact.cpp
src/models/rnd1.cpp
src/models/seed-oss.cpp
src/models/smallthinker.cpp
src/models/smollm3.cpp
src/models/stablelm.cpp
src/models/starcoder.cpp
src/models/starcoder2.cpp
src/models/step35-iswa.cpp
src/models/t5-dec.cpp
src/models/t5-enc.cpp
src/models/xverse.cpp