]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: faster FA for GQA > 1 but not power of 2 (llama/19092)
authorJohannes Gäßler <redacted>
Sun, 25 Jan 2026 20:19:47 +0000 (21:19 +0100)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 13:56:40 +0000 (15:56 +0200)
commitf63848eada9a8a1c1a0ab52c389a15e189e33c58
tree4ddca3df3c61e2f641f427739a1b3276ab5b921a
parent4372b87b8e7b941fdc0d0176963e166747169454
CUDA: faster FA for GQA > 1 but not power of 2 (llama/19092)
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_32.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_32.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/generate_cu_files.py