git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Michael Podvitskiy <redacted>
	Wed, 13 Nov 2024 18:00:35 +0000 (20:00 +0200)
committer	GitHub <redacted>
	Wed, 13 Nov 2024 18:00:35 +0000 (20:00 +0200)
commit	fb4a0ec0833c71cff5a1a367ba375447ce6106eb
tree	5e2c92c69b9e571f459dbec34f4e3fb74e1e9957	tree
parent	5ea926dad7f62ebccff7b24784bd1e01a06d13ae	commit \| diff

llama : propagate the results of `graph_compute` (#9525)

* llama: propagating the results of `graph_compute` to the user interface

* llama: reverting kv_cache in case of failed compute

* llama: `llama_kv_cache_state` was removed, only the result of `llama_graph_compute` is returned

* llama: restore a kv_cache in case of failed computation

* llama: correct reverting of the entire batch.
also updates `llama_kv_cache_find_slot`, will correctly count the number of `used` cells for recurrent models

* llama: updated comments

* llama : add comments about KV cache state after error

---------

Co-authored-by: Georgi Gerganov <redacted>

include/llama.h		diff \| blob \| history
src/llama.cpp		diff \| blob \| history