git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Georgi Gerganov <redacted>
	Sun, 2 Nov 2025 16:14:04 +0000 (18:14 +0200)
committer	GitHub <redacted>
	Sun, 2 Nov 2025 16:14:04 +0000 (18:14 +0200)
commit	cd5e3b57541ecc52421130742f4d89acbcf77cd4
tree	09ab7ad5a96d11291eb7bfc329cc65fe1018c722	tree
parent	87c9efc3b297b8a498716b1db3d061842e6fc85b	commit \| diff

server : support unified cache across slots (#16736)

* server : support unified context across slots

* cont : fix speculative decoding initialization

* context : fix n_ctx_per_seq computation

* server : purge slots one by one

* tests : add unified cache server tests

* llama : update per-seq context computation

* test-thread-safety : handle tiny training context of the input model

* server : fix server_tokens clear()

* server : use 4 slots + unified KV by default

* llama : add note about context size queries

* cont : update todos [no ci]

* context : do not cap the size of the context

* tests : adjust parameters to be CI friendlier

* context : add warning

include/llama.h		diff \| blob \| history
src/llama-context.cpp		diff \| blob \| history
src/llama-context.h		diff \| blob \| history
src/llama-cparams.h		diff \| blob \| history
src/llama-model.cpp		diff \| blob \| history
tests/test-thread-safety.cpp		diff \| blob \| history
tools/server/server.cpp		diff \| blob \| history
tools/server/tests/unit/test_chat_completion.py		diff \| blob \| history
tools/server/tests/unit/test_completion.py		diff \| blob \| history
tools/server/tests/unit/test_infill.py		diff \| blob \| history
tools/server/tests/utils.py		diff \| blob \| history
tools/server/utils.hpp		diff \| blob \| history