]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
ggml: CUDA: add head size 72 for flash-attn (#16962)
authortheo77186 <redacted>
Mon, 3 Nov 2025 13:29:11 +0000 (14:29 +0100)
committerGitHub <redacted>
Mon, 3 Nov 2025 13:29:11 +0000 (14:29 +0100)
commit622cd010ff4a65bef67edbf2f9bf4707c01f98f7
tree415f3251a8e8d1c9507f71f3a060fa9e21070770
parent070ff4d5356083d60b807bb34d36b31c3653a29e
ggml: CUDA: add head size 72 for flash-attn (#16962)
ggml/src/ggml-cuda/fattn-tile.cu
ggml/src/ggml-cuda/fattn-tile.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/template-instances/fattn-tile-instance-dkq72-dv72.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/generate_cu_files.py