]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)
authorJeff Bolz <redacted>
Wed, 2 Apr 2025 19:25:08 +0000 (14:25 -0500)
committerGeorgi Gerganov <redacted>
Thu, 24 Apr 2025 17:39:16 +0000 (20:39 +0300)
commitb243416918971bc2779c1541cc50018d8f2df8bb
treec8d1022ffca04130fad427b15232cefdd78013fa
parent6e532c71877a9872df3ee3fc485d0e44e3c86bdf
vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)

When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.
ggml/src/ggml-vulkan/ggml-vulkan.cpp
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_split_k_reduce.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp