]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
cpu: introduce chunking for flash attention (llama/16829)
authorMax Krasnyansky <redacted>
Thu, 30 Oct 2025 12:26:05 +0000 (05:26 -0700)
committerGeorgi Gerganov <redacted>
Sat, 1 Nov 2025 07:41:35 +0000 (09:41 +0200)
commitd508a2bbd525edd31f5073688b93433fec900dd4
tree3456914cc3d395d68fa50218ae0b2e9c34c09981
parent5f7ee94439f9a9f3da981400120528073417752c
cpu: introduce chunking for flash attention (llama/16829)

Factor out the core FA loop into flash_atten_f16_one_chunk and add an outter loop
on top that handles the chunks.
src/ggml-cpu/ops.cpp