git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Sigbjørn Skjæret <redacted>
	Mon, 20 Oct 2025 19:38:20 +0000 (21:38 +0200)
committer	GitHub <redacted>
	Mon, 20 Oct 2025 19:38:20 +0000 (21:38 +0200)
commit	84bf3c677857279037adf67cdcfd89eaa4ca9281
tree	4112030b4a9d2c0cfb65fe0874585dcb6c143be1	tree
parent	c9c1972e2c2cc6a771fcc145bfa138700179f961	commit \| diff

model : add BailingMoeV2 support (#16063)

* add BailingMoeV2 support

* update llm types

* undo

* undo

* update llm types

* add model collection link

* update

* almost working

* correct group selection and rename n_group_exp

* avoid large top_k and use argmax instead for now

if we had something like argmax2 that would be equivalent, but this works fine until then

* poke

* skip group selection when there are no tokens

* fix 1T conversion

* hopefully fixed expert group selection

third time's the charm?

* make expert group selection generally available

The new LLaDA2Moe model uses this method too, make it generally available regardless of architecture.

* allow n_expert_groups to be 1 (Kimi K2)

* address review suggestions

README.md		diff \| blob \| history
convert_hf_to_gguf.py		diff \| blob \| history
convert_hf_to_gguf_update.py		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/gguf_writer.py		diff \| blob \| history
gguf-py/gguf/tensor_mapping.py		diff \| blob \| history
src/llama-arch.cpp		diff \| blob \| history
src/llama-arch.h		diff \| blob \| history
src/llama-chat.cpp		diff \| blob \| history
src/llama-chat.h		diff \| blob \| history
src/llama-graph.cpp		diff \| blob \| history
src/llama-hparams.h		diff \| blob \| history
src/llama-model.cpp		diff \| blob \| history
src/llama-model.h		diff \| blob \| history
src/llama-vocab.cpp		diff \| blob \| history