]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : separate compute buffer reserve from fattn check (#15696)
authorDiego Devesa <redacted>
Sun, 31 Aug 2025 13:49:03 +0000 (06:49 -0700)
committerGitHub <redacted>
Sun, 31 Aug 2025 13:49:03 +0000 (15:49 +0200)
commit9777032dccd67bdc7785aeab7497014a8be8dacc
tree83ff6b6ce83f06362b49da89b8d32d17ad282db7
parent7d3c9f2b217acf0ce5db81ae83d3f375f49ab2c7
llama : separate compute buffer reserve from fattn check (#15696)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
ggml/include/ggml-backend.h
ggml/src/ggml-backend.cpp
src/llama-context.cpp
src/llama-context.h