]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953)
authorAman Gupta <redacted>
Thu, 22 Jan 2026 10:51:53 +0000 (18:51 +0800)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 13:56:40 +0000 (15:56 +0200)
commitd4fafcfc6fa61434a40b2fb27cc025d84e3aae5b
treef042ef8ec1eb48852a1f14f6fe02afd7b5b9bb6d
parent167fec69d5208a80976f0ef5678a36ef4b3d1b62
CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953)
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/fattn-tile.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/generate_cu_files.py