]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
llama : separate compute buffer reserve from fattn check (llama/15696)
authorDiego Devesa <redacted>
Sun, 31 Aug 2025 13:49:03 +0000 (06:49 -0700)
committerGeorgi Gerganov <redacted>
Sat, 20 Sep 2025 10:42:45 +0000 (13:42 +0300)
commitb11c972b88cd7bb05a9660961fd17b338f1620aa
treec0dfe6b3cba5d7f2497857bc8d28bf24ef9339b8
parentdb7ecfb61dedf7bf0d2a6c79310e17edb89da586
llama : separate compute buffer reserve from fattn check (llama/15696)

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
ggml/include/ggml-backend.h
ggml/src/ggml-backend.cpp