]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
kv-cache : pad the cache size to 256 for performance (#17046)
authorGeorgi Gerganov <redacted>
Fri, 7 Nov 2025 18:03:25 +0000 (20:03 +0200)
committerGitHub <redacted>
Fri, 7 Nov 2025 18:03:25 +0000 (20:03 +0200)
commit16bcc1259d311d0fd37fe00fefcc7900324d38cb
tree292229ca321d74433d55b51f0e825d879413f916
parent9eb9a1331dec83098c858150cd0a8ad9f6d8f46c
kv-cache : pad the cache size to 256 for performance (#17046)

* kv-cache : pad the size of the small SWA cache for performance

* context : pad the total context to 256

* cont : future-proof the swa pad

* server : adjust test params to new logic
include/llama.h
src/llama-context.cpp
src/llama-kv-cache-iswa.cpp
tools/server/tests/unit/test_speculative.py