git.djapps.eu Git - pkg/ggml/sources/ggml/commit

]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit

overview / pkg / ggml / sources / ggml / commit

author	Jeff Bolz <redacted>
	Wed, 14 May 2025 09:55:26 +0000 (18:55 +0900)
committer	Georgi Gerganov <redacted>
	Mon, 19 May 2025 10:37:56 +0000 (13:37 +0300)
commit	c2ccedf2577339ce020545977a0171320f08388b
tree	3a6d009a722ec04c201590719461f40721e6a53b	tree
parent	c804155a55df3de423a820ee4a6faf17577e6a6e	commit \| diff

vulkan: KHR_coopmat flash attention (llama/13506)

This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.

src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
src/ggml-vulkan/vulkan-shaders/flash_attn_cm1.comp	[new file with mode: 0644]	blob
src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp		diff \| blob \| history

Packaging of ggml-org/ggml

RSS Atom