git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Georgi Gerganov <redacted>
	Sat, 31 May 2025 07:24:04 +0000 (10:24 +0300)
committer	GitHub <redacted>
	Sat, 31 May 2025 07:24:04 +0000 (10:24 +0300)
commit	12d0188c0dc6146ffde6d277a93f232ccbe699f8
tree	5e3bcca3a623e028aac8abe35d99d66d46d7bf6e	tree
parent	eb3949938e82a128855bc0676220bb2ce6e4228d	commit \| diff

kv-cache : refactor + add llama_memory_state_i (#13746)

* kv-cache : simplify the "struct llama_kv_cache" interface

ggml-ci

* kv-cache : revert the (n_swa + n_ubatch) change (for next PR)

ggml-ci

* kv-cache : some comments

ggml-ci

* context : fix graph reserve for multiple sequences

ggml-ci

* kv-cache : fix typo [no ci]

* kv-cache : fix find_slot() logic for free slots

ggml-ci

* llama : add TODO for deprecating the defrag API in the future

* kv-cache : improve find_slot() using min/max seq pos info

ggml-ci

* llama : handle aborts and compute errors

ggml-ci

* memory : extract state into llama_memory_state

ggml-ci

* kv-cache : add comments

ggml-ci

* server : update batching logic to reset n_batch on successful decode

* server : upon full re-processing, remove the sequence from the cache

* kv-cache : add TODO for doing split_equal when split_simple fails

ggml-ci

examples/parallel/parallel.cpp		diff \| blob \| history
include/llama.h		diff \| blob \| history
src/llama-batch.cpp		diff \| blob \| history
src/llama-batch.h		diff \| blob \| history
src/llama-context.cpp		diff \| blob \| history
src/llama-context.h		diff \| blob \| history
src/llama-graph.cpp		diff \| blob \| history
src/llama-graph.h		diff \| blob \| history
src/llama-kv-cache.cpp		diff \| blob \| history
src/llama-kv-cache.h		diff \| blob \| history
src/llama-kv-cells.h		diff \| blob \| history
src/llama-memory.h		diff \| blob \| history
src/llama-model.cpp		diff \| blob \| history
tools/server/server.cpp		diff \| blob \| history