]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
cpu: introduce chunking for flash attention (llama/16829)
authorMax Krasnyansky <redacted>
Thu, 30 Oct 2025 12:26:05 +0000 (05:26 -0700)
committerGeorgi Gerganov <redacted>
Sun, 9 Nov 2025 21:38:03 +0000 (23:38 +0200)
commitf1fdb91e95f9941fedbdb718dfa2e233716639b0
treedb6814c9a26748296c2945bd2b4eaaada7aa7c7e
parentf7dfa39104dbb756fc0d839698edaffaf3c7ddaa
cpu: introduce chunking for flash attention (llama/16829)

Factor out the core FA loop into flash_atten_f16_one_chunk and add an outter loop
on top that handles the chunks.
ggml/src/ggml-cpu/ops.cpp