]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: add gqa_ratio 4 for GLM 4.7 flash (#18953)
authorAman Gupta <redacted>
Thu, 22 Jan 2026 10:51:53 +0000 (18:51 +0800)
committerGitHub <redacted>
Thu, 22 Jan 2026 10:51:53 +0000 (18:51 +0800)
commitb70d251076ac7c3ac1cd5d39dbb167f6ff3b6880
treeff3e5430f5b42f908379efb30a2babb5ff8576d9
parent5516b9c16aed771b42403aca9a561af61a564c25
CUDA: add gqa_ratio 4 for GLM 4.7 flash (#18953)
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/fattn-tile.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu
ggml/src/ggml-cuda/template-instances/generate_cu_files.py