]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)
authorAnav Prasad <redacted>
Wed, 1 Apr 2026 07:07:24 +0000 (07:07 +0000)
committerGitHub <redacted>
Wed, 1 Apr 2026 07:07:24 +0000 (09:07 +0200)
commit88458164c77509d2022e45f71aaf97040667abe2
tree723ec42b37526f4b8080da94be35b17897eee4e5
parent49512502359b9d3a5ea589c215396513c53fe064
CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)

* flash attention support for head dimension 512 added

* FA D=512 - match 576 configs, limit ncols2, revert vec cap

* fix HIP tile kernel build for D=512

* fix HIP tile kernel occupancy for D=512 on AMD

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* fix tile FA compilation

---------

Co-authored-by: Johannes Gäßler <redacted>
14 files changed:
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/fattn-tile.cu
ggml/src/ggml-cuda/fattn-tile.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_8.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_8.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_8.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_8.cu
ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq512-dv512.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/generate_cu_files.py