]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
vulkan: For coopmat2 FA, use fp16 accumulators for the final result (llama/19376)
authorJeff Bolz <redacted>
Fri, 6 Feb 2026 08:15:13 +0000 (02:15 -0600)
committerGeorgi Gerganov <redacted>
Sun, 8 Feb 2026 07:29:10 +0000 (09:29 +0200)
commitcea22b3075684fc4d949982eb412b47f8da205cc
treea448e77ec140e52f9279b8ccb521deab968d5969
parentc1b63354bb566143cf7987c20fed9256a0b79338
vulkan: For coopmat2 FA, use fp16 accumulators for the final result (llama/19376)

The cpu and cuda backends use fp16 for the VKQ accumulator type, this change
does the same for vulkan. This helps particularly with large head sizes which
are very register-limited.

I tried this for the coopmat1 path and it slowed down a bit. I didn't try for
scalar.

I applied the softmax bias that the cuda backend uses to avoid overflow,
although I was not able to reproduce the original bug without it.
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_base.glsl
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp