git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Georgi Gerganov <redacted>
	Thu, 9 Oct 2025 15:54:51 +0000 (18:54 +0300)
committer	GitHub <redacted>
	Thu, 9 Oct 2025 15:54:51 +0000 (18:54 +0300)
commit	d00cbea63c671cd85a57adaa50abf60b3b87d86f
tree	d7fdb4b134a6540680b3c6549932f05cd89142cd	tree
parent	8328fd4bae76fc027f8ca0e9a05acd3788dabe3f	commit \| diff

server : host-memory prompt caching (#16391)

* minor : code style

* server : fix prompt similarity calculation

* server : initial host-memory prompt caching

* cont

* server : refactor

* cont

* cont : make the server task of the slot const

* cont : minor [no ci]

* server : cache prompts and checkpoints only for completion tasks

* server : improve prompt caching logic

* cont : fix check for number of cached prompts [no ci]

* server : improve caching logic, add -cram CLI arg

* server : print prompt mismatch info

* cont : better naming [no ci]

* server : improve prompt cache loading logic

* server : add option to debug the slot contents (#16482)

* server : add option to debug the slot contents

* Update tools/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <redacted>
* server : add option to disable prompt cache

---------

Co-authored-by: Xuan-Son Nguyen <redacted>

common/arg.cpp		diff \| blob \| history
common/chat.h		diff \| blob \| history
common/common.h		diff \| blob \| history
src/llama-kv-cache.cpp		diff \| blob \| history
tools/server/server.cpp		diff \| blob \| history
tools/server/tests/unit/test_basic.py		diff \| blob \| history
tools/server/tests/unit/test_chat_completion.py		diff \| blob \| history
tools/server/tests/unit/test_completion.py		diff \| blob \| history
tools/server/tests/unit/test_ctx_shift.py		diff \| blob \| history
tools/server/utils.hpp		diff \| blob \| history