git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	David Friehs <redacted>
	Sat, 13 Jan 2024 16:29:43 +0000 (17:29 +0100)
committer	GitHub <redacted>
	Sat, 13 Jan 2024 16:29:43 +0000 (18:29 +0200)
commit	df845cc982e7e2ea7b9900e29d55b15338faa78d
tree	07c1eb5f5b9a3ac21fa70e499029907d9d90b008	tree
parent	6b48ed089377330cdb362970a51c1c89b6d857a8	commit \| diff

llama : minimize size used for state save/load (#4820)

* examples : save-load-state: save only required state

* llama : only reserve n_vocab * n_batch at most for logits

llama_decode asserts that only n_batch tokens are passed each call, and
n_ctx is expected to be bigger than n_batch.

* llama : always reserve n_vocab * n_batch for logits

llama_context de-serialization breaks if the contexts have differing
capacity for logits and llama_decode will at maximum resize to
n_vocab * n_batch.

* llama : only save and restore used logits

for batch sizes of 512 this reduces save state in the best case by
around 62 MB, which can be a lot if planning to save on each message
to allow regenerating messages.

* llama : use ostringstream and istringstream for save and load

* llama : serialize rng into minimum amount of space required

* llama : break session version due to serialization changes

examples/save-load-state/save-load-state.cpp		diff \| blob \| history
llama.cpp		diff \| blob \| history
llama.h		diff \| blob \| history