]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
model : add BailingMoeV2 support (#16063)
authorSigbjørn Skjæret <redacted>
Mon, 20 Oct 2025 19:38:20 +0000 (21:38 +0200)
committerGitHub <redacted>
Mon, 20 Oct 2025 19:38:20 +0000 (21:38 +0200)
commit84bf3c677857279037adf67cdcfd89eaa4ca9281
tree4112030b4a9d2c0cfb65fe0874585dcb6c143be1
parentc9c1972e2c2cc6a771fcc145bfa138700179f961
model : add BailingMoeV2 support (#16063)

* add BailingMoeV2 support

* update llm types

* undo

* undo

* update llm types

* add model collection link

* update

* almost working

* correct group selection and rename n_group_exp

* avoid large top_k and use argmax instead for now

if we had something like argmax2 that would be equivalent, but this works fine until then

* poke

* skip group selection when there are no tokens

* fix 1T conversion

* hopefully fixed expert group selection

third time's the charm?

* make expert group selection generally available

The new LLaDA2Moe model uses this method too, make it generally available regardless of architecture.

* allow n_expert_groups to be 1 (Kimi K2)

* address review suggestions
15 files changed:
README.md
convert_hf_to_gguf.py
convert_hf_to_gguf_update.py
gguf-py/gguf/constants.py
gguf-py/gguf/gguf_writer.py
gguf-py/gguf/tensor_mapping.py
src/llama-arch.cpp
src/llama-arch.h
src/llama-chat.cpp
src/llama-chat.h
src/llama-graph.cpp
src/llama-hparams.h
src/llama-model.cpp
src/llama-model.h
src/llama-vocab.cpp