]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: faster FA for GQA > 1 but not power of 2 (#19092)
authorJohannes Gäßler <redacted>
Sun, 25 Jan 2026 20:19:47 +0000 (21:19 +0100)
committerGitHub <redacted>
Sun, 25 Jan 2026 20:19:47 +0000 (21:19 +0100)
commit0c21677e43044d27f6f7a7f9f95c67f7c4b3fdb4
treea6be275854daa48b013783255348ebdbec0aa8ae
parent0440bfd1605333726ea0fb7a836942660bf2f9a6
CUDA: faster FA for GQA > 1 but not power of 2 (#19092)
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_32.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_32.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/generate_cu_files.py