]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
models : optimize qwen3next graph (llama/19375)
authorGeorgi Gerganov <redacted>
Sat, 14 Feb 2026 10:57:36 +0000 (12:57 +0200)
committerGeorgi Gerganov <redacted>
Sat, 14 Feb 2026 22:20:18 +0000 (00:20 +0200)
commitf61050d0c0771749179486f1672d4b0b43f97637
tree0552524a8b2abd47de17422e4ac23b44970973c7
parentd07b0e5a9575a6faff2054eec7595c2f7645b34c
models : optimize qwen3next graph (llama/19375)

* models : optimizing qwen3next graph

* cont

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* cont : remove redundant q, g chunking

* minor

* minor

* avoid passing masks around

* avoid concats during chunking

* naming + shapes

* update names and use prefix to disable CUDA graphs
src/ggml-cuda/ggml-cuda.cu
src/ggml-metal/ggml-metal-common.cpp