git.djapps.eu Git - pkg/ggml/sources/ggml/commit

]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit

overview / pkg / ggml / sources / ggml / commit

author	Gaurav Garg <redacted>
	Thu, 3 Apr 2025 16:20:29 +0000 (21:50 +0530)
committer	Georgi Gerganov <redacted>
	Tue, 8 Apr 2025 08:47:46 +0000 (11:47 +0300)
commit	770370c6603d29a464ac33d8031fb4506059e5ac
tree	525ce23131e5da9660ae733630a9426d623e1e27	tree
parent	406ee7592472aa3598f38829de7234f27cbb83b4	commit \| diff

CUDA: Prefer vector flash decoding kernel for Gemma models (llama/12738)

* Prefer vector flash decoding kernel for Gemma models

Vector flash decoding kernel was not being picked for models with head dimension 256. Gemma models are in this category.
Removing this limit improves e2e performance by upto 12% in gen phase throughput for Gemm models.

* Update ggml/src/ggml-cuda/fattn.cu

Co-authored-by: Johannes Gäßler <redacted>
---------

Co-authored-by: Johannes Gäßler <redacted>

src/ggml-cuda/fattn.cu

diff | blob | history

Packaging of ggml-org/ggml

RSS Atom