]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: faster FA for GQA > 1 but not power of 2 (llama/19092)
authorJohannes Gäßler <redacted>
Sun, 25 Jan 2026 20:19:47 +0000 (21:19 +0100)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 11:49:29 +0000 (13:49 +0200)
commita82d61d27660f6ce7d25aa12484de1847379bec2
tree11d3d895febaf38cdf1888c42a5811eb200a8130
parent1a7ad53e548784d4ef44898b4ea48c3f46d6aa53
CUDA: faster FA for GQA > 1 but not power of 2 (llama/19092)
src/ggml-cuda/fattn-common.cuh
src/ggml-cuda/fattn-mma-f16.cuh
src/ggml-cuda/fattn.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_32.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_32.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/generate_cu_files.py