git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

overview / pkg / ggml / sources / whisper.cpp / commit

author	Jeff Bolz <redacted>
	Tue, 8 Jul 2025 18:11:42 +0000 (13:11 -0500)
committer	Georgi Gerganov <redacted>
	Sat, 12 Jul 2025 16:23:56 +0000 (19:23 +0300)
commit	fadb3233b6dcf86f6b5e998dbabb84e43d78e0d7
tree	c25a003d19ea0698401e7aa9a31ee3fff480b52a	tree
parent	9750e4c98891d6aa54462f9e72d911992f5025c2	commit \| diff

vulkan: optimize flash attention split_k_reduce (llama/14554)

* vulkan: allow FA split_k with smaller KV values

* vulkan: spread split_k_reduce work across more threads

k_num can get rather large. Use the whole workgroup to reduce the M/L values.

Launch a thread for each element in the HSV dimension of the output. Helps a
lot for large HSV (like deepseek).

ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_split_k_reduce.comp		diff \| blob \| history

Packaging of ggerganov/whisper.cpp

RSS Atom