]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
vulkan: Use fp16 for the flash attention P*V multiplication (llama/12783)
authorJeff Bolz <redacted>
Wed, 9 Apr 2025 05:12:57 +0000 (00:12 -0500)
committerGeorgi Gerganov <redacted>
Thu, 24 Apr 2025 17:39:16 +0000 (20:39 +0300)
commit1d50c6ac2262ada2d5e75dd1138b8fad3a10db15
tree93cb705392dacaff21f0ebdf8c6bfe1a68b40682
parent79f23d9132a984fd7f84741e5c3e9343adee7553
vulkan: Use fp16 for the flash attention P*V multiplication (llama/12783)

This is consistent with the ggml-cuda behavior and the mul_mat fallback.
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp