]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
vulkan: support flash attention GQA/split_k with small batches (#18938)
authorJeff Bolz <redacted>
Wed, 21 Jan 2026 16:43:43 +0000 (10:43 -0600)
committerGitHub <redacted>
Wed, 21 Jan 2026 16:43:43 +0000 (17:43 +0100)
commit33f890e5799d1bb539f72f3d7e1ea30b8cff1123
tree161f237794c90d39cc5deff7c5cd1a449750ab98
parent067b8d7af3f4775b1ad39470d1860696a8a7b3a3
vulkan: support flash attention GQA/split_k with small batches (#18938)
ggml/src/ggml-vulkan/ggml-vulkan.cpp
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn.comp
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_base.glsl
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm1.comp
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_split_k_reduce.comp
tests/test-backend-ops.cpp