git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Jeff Bolz <redacted>
	Wed, 2 Apr 2025 19:25:08 +0000 (14:25 -0500)
committer	GitHub <redacted>
	Wed, 2 Apr 2025 19:25:08 +0000 (14:25 -0500)
commit	f01bd02376f919b05ee635f438311be8dfc91d7c
tree	ea9623b99ce8043f7e6da171a4917e7a48dbf999	tree
parent	6f3bd38640f07e4dec7f145d2fbf093ce48c9544	commit \| diff

vulkan: Implement split_k for coopmat2 flash attention. (#12627)

When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.

ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_split_k_reduce.comp	[new file with mode: 0644]	blob
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom