]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml: CUDA: add head size 72 for flash-attn (llama/16962)
authortheo77186 <redacted>
Mon, 3 Nov 2025 13:29:11 +0000 (14:29 +0100)
committerGeorgi Gerganov <redacted>
Sun, 9 Nov 2025 16:30:22 +0000 (18:30 +0200)
commit0f6227f4facd92e6b52ad5de248082834a68832d
tree6e6d00273bb96e0d467ea0df651e6d397845e7cf
parent91c1ecc7f41e555f12dc90b793c824b60054a8ec
ggml: CUDA: add head size 72 for flash-attn (llama/16962)
src/ggml-cuda/fattn-tile.cu
src/ggml-cuda/fattn-tile.cuh
src/ggml-cuda/fattn.cu
src/ggml-cuda/template-instances/fattn-tile-instance-dkq72-dv72.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/generate_cu_files.py