]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
model: support GLM 4.5 family of models (#14939)
authorSam <redacted>
Mon, 4 Aug 2025 18:29:25 +0000 (04:29 +1000)
committerGitHub <redacted>
Mon, 4 Aug 2025 18:29:25 +0000 (20:29 +0200)
commitef0144c087b33e5b8da42d529ac71aaf05cb49df
tree5e264bffffa171a08373e07fc16874885382c731
parent2721257e3e2c4c944ac8a08221113ee7cb503f1b
model: support GLM 4.5 family of models (#14939)

* model: Add GLM 4.5 (#14921)

Co-authored-by: Sigbjørn Skjæret <redacted>
* Merge in PR suggestions

Co-authored-by: Sigbjørn Skjæret <redacted>
* model: Add GLM 4.5 family of models (#14921)

1. Updated tensor_mapping.py with NextN tensor mappings

- Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py
- Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm

2. Added num_nextn_predict_layers configuration

- Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp
- Added num_nextn_predict_layers field to llama_hparams struct
- Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter
- Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers
- Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method
- Updated conversion script to extract and write this parameter from HuggingFace config

3. Added FIM tokens for GLM4_MOE

- Added GLM-4.5's FIM tokens to llama-vocab.cpp:
  - <|code_prefix|> for FIM_PRE
  - <|code_suffix|> for FIM_SUF
  - <|code_middle|> for FIM_MID

4. Removed manual NextN tensor handling

- Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors
- NextN tensors are now handled automatically through the proper tensor mapping system

* glm 4.5 update tensors names

* model: glm 4.5 apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <redacted>
* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <redacted>
* model: glm 4.5 apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <redacted>
* model: glm 4.5 apply suggestions from code review

* Apply suggestions from code review

* patch broken chat template

* typings fix

* add TENSOR_SKIP flag

Co-authored-by: Diego Devesa <redacted>
* Update src/llama-model-loader.h

Co-authored-by: Sigbjørn Skjæret <redacted>
---------

Co-authored-by: Sigbjørn Skjæret <redacted>
Co-authored-by: Diego Devesa <redacted>
15 files changed:
convert_hf_to_gguf.py
convert_hf_to_gguf_update.py
gguf-py/gguf/constants.py
gguf-py/gguf/gguf_writer.py
gguf-py/gguf/tensor_mapping.py
models/templates/README.md
src/llama-arch.cpp
src/llama-arch.h
src/llama-graph.cpp
src/llama-hparams.h
src/llama-kv-cache-unified.cpp
src/llama-model-loader.h
src/llama-model.cpp
src/llama-model.h
src/llama-vocab.cpp