]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server: save and clear idle slots on new task (`--clear-idle`) (#20993)
authorYes You Can Have Your Own <redacted>
Fri, 3 Apr 2026 17:02:27 +0000 (20:02 +0300)
committerGitHub <redacted>
Fri, 3 Apr 2026 17:02:27 +0000 (19:02 +0200)
commit50e0ad08fb6906fda9ac2e256e43a4bbf9c85639
treeaf3fa10234d8995a2103bae6bc72ab604847cb38
parentf1f793ad0663a223d3f4f7f3d14875a009d59f5a
server: save and clear idle slots on new task (`--clear-idle`) (#20993)

* server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE)

* server: move idle slot KV clearing to slot release

The save "cost" is now paid by the finishing request.

* server: add --kv-clear-idle flag, enable by default

* server: skip clearing last idle slot, clear on launch

* server: test --no-kv-clear-idle flag

* server: simplify on-release clearing loop

* server: remove on-release KV clearing, keep launch-only

* cont : clean-up

* tests: update log strings after --clear-idle rename

* tests: use debug tags instead of log message matching

* test: fix Windows CI by dropping temp log file unlink

---------

Co-authored-by: Georgi Gerganov <redacted>
common/arg.cpp
common/common.h
tools/cli/README.md
tools/completion/README.md
tools/server/README.md
tools/server/server-context.cpp
tools/server/server-task.cpp
tools/server/tests/unit/test_kv_keep_only_active.py [new file with mode: 0644]
tools/server/tests/utils.py