]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
model : add support for SmallThinker series (#14898)
authorDongliang Wei <redacted>
Mon, 28 Jul 2025 11:47:00 +0000 (19:47 +0800)
committerGitHub <redacted>
Mon, 28 Jul 2025 11:47:00 +0000 (13:47 +0200)
commit6c6e397affc4fac717e718364fb4b635cec6433a
treea5b3e316cf3e51808961aea2f2cfa034c9588be2
parentafc0e8969896ada62238da07b98731e5a4b12ba4
model : add support for SmallThinker series (#14898)

* support smallthinker

* support 20b softmax, 4b no sliding window

* new build_moe_ffn_from_probs, and can run 4b

* fix 4b rope bug

* fix python type check

* remove is_moe judge

* remove set_dense_start_swa_pattern function and modify set_swa_pattern function

* trim trailing whitespace

* remove get_vocab_base of SmallThinkerModel in convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <redacted>
* better whitespace

Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <redacted>
* use GGML_ASSERT for expert count validation

Co-authored-by: Sigbjørn Skjæret <redacted>
* Improve null pointer check for probs

Co-authored-by: Sigbjørn Skjæret <redacted>
* use template parameter for SWA attention logic

* better whitespace

Co-authored-by: Georgi Gerganov <redacted>
* move the creation of inp_out_ids before the layer loop

* remove redundant judge for probs

---------

Co-authored-by: Sigbjørn Skjæret <redacted>
Co-authored-by: Georgi Gerganov <redacted>
convert_hf_to_gguf.py
gguf-py/gguf/constants.py
gguf-py/gguf/tensor_mapping.py
src/llama-arch.cpp
src/llama-arch.h
src/llama-graph.cpp
src/llama-graph.h
src/llama-hparams.cpp
src/llama-hparams.h
src/llama-model.cpp