git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

overview / pkg / ggml / sources / whisper.cpp / commit

author	Jeff Bolz <redacted>
	Wed, 14 May 2025 09:55:26 +0000 (18:55 +0900)
committer	Georgi Gerganov <redacted>
	Mon, 19 May 2025 11:58:39 +0000 (14:58 +0300)
commit	162bbe8220901619b688739c5da6a7142a18305e
tree	6d451dd894d81ca71b6e17d190bef686898c2d16	tree
parent	a221288dc6f0f8642ae9138f68d2b266d78cf811	commit \| diff

vulkan: KHR_coopmat flash attention (llama/13506)

This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.

ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm1.comp	[new file with mode: 0644]	blob
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp		diff \| blob \| history

Packaging of ggerganov/whisper.cpp

RSS Atom