]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
model: try to improve Qwen3 Next (#18683)
authorXuan-Son Nguyen <redacted>
Sun, 11 Jan 2026 11:53:33 +0000 (12:53 +0100)
committerGitHub <redacted>
Sun, 11 Jan 2026 11:53:33 +0000 (12:53 +0100)
commit506bb6e01009058f35558474cf987eeb56361782
tree08e7a99e6dbbc196061ddf4d91d19283fdfc73f2
parent79456a690ae35cb2a75faf24d3b1926f716b0485
model: try to improve Qwen3 Next (#18683)

* qwen3next: simplify qkvz projection

* use ggml_swiglu_split

* revert swiglu_split, but remove redundant repeat()

* fix missing reshape

* rm 2 redundant transposes

* move mul_mat(k,q) to outside of chunking

* rm redundant cont

* improve g_cs_chunk

* add comments about no cont

* use std::pair instead of ggml_concat

* vectorize key_gdiff calculation

* rm unused tensor

* avoid ggml_concat inside loop

* bring back ggml_concat as it may not work on other backend

* nits
convert_hf_to_gguf.py
gguf-py/gguf/constants.py
src/llama-arch.cpp
src/llama-model.cpp
src/models/models.h
src/models/qwen3next.cpp