]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
vulkan: Use fp16 for the flash attention P*V multiplication (llama/12783)
authorJeff Bolz <redacted>
Wed, 9 Apr 2025 05:12:57 +0000 (00:12 -0500)
committerGeorgi Gerganov <redacted>
Thu, 10 Apr 2025 20:58:06 +0000 (23:58 +0300)
commitdcaef71e941d4e9772319f7f11e89a0900eb5a78
tree349a9e61f1fcaeb0a0b0520d239d9e3b751bfe71
parentd206e725693e8b06f7e5af33b6fe2bb5aa25e1ef
vulkan: Use fp16 for the flash attention P*V multiplication (llama/12783)

This is consistent with the ggml-cuda behavior and the mul_mat fallback.
src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp