]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
cpu: introduce chunking for flash attention (#16829)
authorMax Krasnyansky <redacted>
Thu, 30 Oct 2025 12:26:05 +0000 (05:26 -0700)
committerGitHub <redacted>
Thu, 30 Oct 2025 12:26:05 +0000 (14:26 +0200)
commitdcca0d3ab840ebe9b2ccd4719033d408eeb758d7
tree14e1b91a4495c4f7044065fffd9c899f52c7f990
parentbacddc049a00786df44e682262f6e298742bfbc3
cpu: introduce chunking for flash attention (#16829)

Factor out the core FA loop into flash_atten_f16_one_chunk and add an outter loop
on top that handles the chunks.
ggml/src/ggml-cpu/ops.cpp