]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953)
authorAman Gupta <redacted>
Thu, 22 Jan 2026 10:51:53 +0000 (18:51 +0800)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 11:49:29 +0000 (13:49 +0200)
commite5c8629f115096ac1a347a952c55ef616e5167c8
tree2d7a6185198ddfbd58e296e6db7b9bc06fb39cb8
parent7751c92aedccc12cf4c51629418b833318c25a72
CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953)
src/ggml-cuda/fattn-mma-f16.cuh
src/ggml-cuda/fattn-tile.cuh
src/ggml-cuda/fattn.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu
src/ggml-cuda/template-instances/generate_cu_files.py