git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Jeff Bolz <redacted>
	Wed, 14 May 2025 09:55:26 +0000 (18:55 +0900)
committer	GitHub <redacted>
	Wed, 14 May 2025 09:55:26 +0000 (11:55 +0200)
commit	24e86cae7219b0f3ede1d5abdf5bf3ad515cccb8
tree	870b86bb74c6dc3b1aa32a74da0d5a387e31fd8b	tree
parent	bb1681fbd532eba26ae4c14cd8be884c8afeb31c	commit \| diff

vulkan: KHR_coopmat flash attention (#13506)

This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.

ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm1.comp	[new file with mode: 0644]	blob
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom