]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
ggml-cuda: native bf16 flash attention for vec kernel (#20525)
authorPatrick Buckley <redacted>
Sun, 22 Mar 2026 10:05:51 +0000 (03:05 -0700)
committerGitHub <redacted>
Sun, 22 Mar 2026 10:05:51 +0000 (11:05 +0100)
commitdb9d8aa428012cc5593e18635d4c3c54095f5138
tree8c5e856b43f1b93ef769abba876839610f750f5b
parentccb87fa3ee1961ec915f77cb447706f471dca6a5
ggml-cuda: native bf16 flash attention for vec kernel (#20525)

* ggml-cuda: native bf16 flash attention for vec and tile kernels

mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo

* ggml-cuda: address code owner review feedback

reverted tile kernel changes to avoid larger refactor

* fix ci failures on turing and hip

* fix bf16 vec kernel compile on hip v_dot2 platforms

* add comments

---------

Co-authored-by: Johannes Gäßler <redacted>
21 files changed:
ggml/src/ggml-cuda/CMakeLists.txt
ggml/src/ggml-cuda/convert.cuh
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-vec.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-bf16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-f16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q4_0.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q4_1.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q5_0.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q5_1.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q8_0.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-f16-bf16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-bf16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-bf16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-bf16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-bf16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-bf16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/generate_cu_files.py
ggml/src/ggml-hip/CMakeLists.txt
ggml/src/ggml-musa/CMakeLists.txt