git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Anav Prasad <redacted>
	Wed, 1 Apr 2026 07:07:24 +0000 (07:07 +0000)
committer	GitHub <redacted>
	Wed, 1 Apr 2026 07:07:24 +0000 (09:07 +0200)
commit	88458164c77509d2022e45f71aaf97040667abe2
tree	723ec42b37526f4b8080da94be35b17897eee4e5	tree
parent	49512502359b9d3a5ea589c215396513c53fe064	commit \| diff

CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)

* flash attention support for head dimension 512 added

* FA D=512 - match 576 configs, limit ncols2, revert vec cap

* fix HIP tile kernel build for D=512

* fix HIP tile kernel occupancy for D=512 on AMD

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* fix tile FA compilation

---------

Co-authored-by: Johannes Gäßler <redacted>

14 files changed:

ggml/src/ggml-cuda/fattn-mma-f16.cuh		diff \| blob \| history
ggml/src/ggml-cuda/fattn-tile.cu		diff \| blob \| history
ggml/src/ggml-cuda/fattn-tile.cuh		diff \| blob \| history
ggml/src/ggml-cuda/fattn.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_8.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_8.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_8.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_8.cu		diff \| blob \| history
ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq512-dv512.cu	[new file with mode: 0644]	blob
ggml/src/ggml-cuda/template-instances/generate_cu_files.py		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom