]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
vulkan: Use fp16 for the flash attention P*V multiplication (#12783)
authorJeff Bolz <redacted>
Wed, 9 Apr 2025 05:12:57 +0000 (00:12 -0500)
committerGitHub <redacted>
Wed, 9 Apr 2025 05:12:57 +0000 (07:12 +0200)
commit7ecd780b1a1d5214b8d04c25ebfc194d310816ed
tree488a39949a744d4d34aba433be30b69b70caa3fa
parent7538246e7ce0606694c38055cc2fc9f60535be6c
vulkan: Use fp16 for the flash attention P*V multiplication (#12783)

This is consistent with the ggml-cuda behavior and the mul_mat fallback.
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp