]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
ggml: CUDA: add head size 72 for flash-attn (llama/16962)
authortheo77186 <redacted>
Mon, 3 Nov 2025 13:29:11 +0000 (14:29 +0100)
committerGeorgi Gerganov <redacted>
Sun, 9 Nov 2025 21:38:03 +0000 (23:38 +0200)
commit82ede64cd0e31deab22d2d84b57e0088c42c0e1b
tree59f35bfe2f4fea1cddcf88b7fd7306b6028fc9f0
parent79801188f769ef2a701a7474c20c06d2bc5b7769
ggml: CUDA: add head size 72 for flash-attn (llama/16962)
ggml/src/ggml-cuda/fattn-tile.cu
ggml/src/ggml-cuda/fattn-tile.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq72-dv72.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/generate_cu_files.py