]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
models : optimize qwen3next graph (#19375)
authorGeorgi Gerganov <redacted>
Sat, 14 Feb 2026 10:57:36 +0000 (12:57 +0200)
committerGitHub <redacted>
Sat, 14 Feb 2026 10:57:36 +0000 (12:57 +0200)
commit1725e316c1a780759ec134ca5a2999f4d53ce273
tree6e270224bb3e44cf74d0dde8827d63d7879333ff
parentb7742cf3217932b2e237861c8586b6f600f072fb
models : optimize qwen3next graph (#19375)

* models : optimizing qwen3next graph

* cont

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* cont : remove redundant q, g chunking

* minor

* minor

* avoid passing masks around

* avoid concats during chunking

* naming + shapes

* update names and use prefix to disable CUDA graphs
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-metal/ggml-metal-common.cpp
src/models/models.h
src/models/qwen3next.cpp