git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

overview / pkg / ggml / sources / whisper.cpp / commit

author	Gaurav Garg <redacted>
	Thu, 3 Apr 2025 16:20:29 +0000 (21:50 +0530)
committer	Georgi Gerganov <redacted>
	Thu, 24 Apr 2025 17:39:16 +0000 (20:39 +0300)
commit	2f0612cb1c168dbecd2d94b9665b11d2f023ffe9
tree	ec74e61b1f35b9f61d97ae53fcbb7aec5a430dd5	tree
parent	e944065d5bb025f0334f2293de9b50b2d42da616	commit \| diff

CUDA: Prefer vector flash decoding kernel for Gemma models (llama/12738)

* Prefer vector flash decoding kernel for Gemma models

Vector flash decoding kernel was not being picked for models with head dimension 256. Gemma models are in this category.
Removing this limit improves e2e performance by upto 12% in gen phase throughput for Gemm models.

* Update ggml/src/ggml-cuda/fattn.cu

Co-authored-by: Johannes Gäßler <redacted>
---------

Co-authored-by: Johannes Gäßler <redacted>

ggml/src/ggml-cuda/fattn.cu

diff | blob | history

Packaging of ggerganov/whisper.cpp

RSS Atom