]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml-cuda: native bf16 flash attention for vec kernel (llama/20525)
authorPatrick Buckley <redacted>
Sun, 22 Mar 2026 10:05:51 +0000 (03:05 -0700)
committerGeorgi Gerganov <redacted>
Sat, 28 Mar 2026 11:39:09 +0000 (13:39 +0200)
commitdd1bad2eb71a503b070890900b46275cc852549c
tree89740b282ebe763cb90931917df11d837b5011b9
parent81419ebe55fed50f3cf55601d57549320325e248
ggml-cuda: native bf16 flash attention for vec kernel (llama/20525)

* ggml-cuda: native bf16 flash attention for vec and tile kernels

mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo

* ggml-cuda: address code owner review feedback

reverted tile kernel changes to avoid larger refactor

* fix ci failures on turing and hip

* fix bf16 vec kernel compile on hip v_dot2 platforms

* add comments

---------

Co-authored-by: Johannes Gäßler <redacted>
21 files changed:
src/ggml-cuda/CMakeLists.txt
src/ggml-cuda/convert.cuh
src/ggml-cuda/fattn-common.cuh
src/ggml-cuda/fattn-vec.cuh
src/ggml-cuda/fattn.cu
src/ggml-cuda/template-instances/fattn-vec-instance-bf16-bf16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-bf16-f16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q4_0.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q4_1.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q5_0.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q5_1.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-bf16-q8_0.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-f16-bf16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-q4_0-bf16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-q4_1-bf16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-q5_0-bf16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-q5_1-bf16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-vec-instance-q8_0-bf16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/generate_cu_files.py
src/ggml-hip/CMakeLists.txt
src/ggml-musa/CMakeLists.txt