]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
llama : separate compute buffer reserve from fattn check (llama/15696)
authorDiego Devesa <redacted>
Sun, 31 Aug 2025 13:49:03 +0000 (06:49 -0700)
committerGeorgi Gerganov <redacted>
Fri, 5 Sep 2025 09:54:09 +0000 (12:54 +0300)
commit319bf932319ab96617ffa7ea93c5be2574edbdb0
tree9191ae0682c6cd482fee2a572422a334af6ce8b0
parentcd2cdfdad83421a9744e5518619b2c8b9bcd68d0
llama : separate compute buffer reserve from fattn check (llama/15696)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
include/ggml-backend.h
src/ggml-backend.cpp