]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: Remove unneded bias/gate dims in fused mmvq (llama/16858)
authorOliver Simons <redacted>
Sat, 1 Nov 2025 05:13:26 +0000 (06:13 +0100)
committerGeorgi Gerganov <redacted>
Sun, 9 Nov 2025 16:30:22 +0000 (18:30 +0200)
commitba4efa9271e6f8141322918e6d7a44800813f3dd
treedc21d3c49117e326779bc0a41cfd7ab6f4417a2d
parentb9bcdd76de0fbec07758d17bfbd05667f735490b
CUDA: Remove unneded bias/gate dims in fused mmvq (llama/16858)

* CUDA: Remove unneded bias/gate dims in fused mmvq

Pointed out
[here](https://github.com/ggml-org/llama.cpp/pull/16847#discussion_r2476798989)
that only a single value is needed per target col per thread

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* Fix "Error 991-D: extra braces are nonstandard" during compilation

---------

Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/mmvq.cu