]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: Add Flash Attention Support for Head Dimension 512 (llama/20998)
authorAnav Prasad <redacted>
Wed, 1 Apr 2026 07:07:24 +0000 (07:07 +0000)
committerGeorgi Gerganov <redacted>
Wed, 1 Apr 2026 13:00:26 +0000 (16:00 +0300)
commitfbee96b76fedd7e31926cc184601a6cf65c6382c
treeda504f7e9c50d4f140211314971301dfc45fc104
parent36b429c207704f435b08704c7fdd3c0e67acec56
CUDA: Add Flash Attention Support for Head Dimension 512 (llama/20998)

* flash attention support for head dimension 512 added

* FA D=512 - match 576 configs, limit ncols2, revert vec cap

* fix HIP tile kernel build for D=512

* fix HIP tile kernel occupancy for D=512 on AMD

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* fix tile FA compilation

---------

Co-authored-by: Johannes Gäßler <redacted>
14 files changed:
src/ggml-cuda/fattn-mma-f16.cuh
src/ggml-cuda/fattn-tile.cu
src/ggml-cuda/fattn-tile.cuh
src/ggml-cuda/fattn.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_8.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_8.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_8.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_8.cu
src/ggml-cuda/template-instances/fattn-tile-instance-dkq512-dv512.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/generate_cu_files.py