git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Georgi Gerganov <redacted>
	Tue, 20 May 2025 05:05:46 +0000 (08:05 +0300)
committer	GitHub <redacted>
	Tue, 20 May 2025 05:05:46 +0000 (08:05 +0300)
commit	e298d2fbd082a52c0f6ed02729f94e9bf630cf17
tree	9fafda7f1caf7f532e9777cf07b8d987b837d471	tree
parent	f0adb80bf7c2c0d80abb04f4533b5513622d9964	commit \| diff

kv-cache : add SWA support (#13194)

* kv-cache : prepare for SWA

ggml-ci

* kv-cache : initial iSWA implementation

ggml-ci

* kv-cache : rework error recovery logic

ggml-ci

* models : fix Phi-3 SWA parameters

ggml-ci

* model : adjust Granite to rope factor changes

ggml-ci

* server : check if context can do shifts

ggml-ci

* iswa : for now, always enable shifts (experiment)

ggml-ci

* kv-cache : simplify SWA logic

ggml-ci

* kv-cache : apply defrag when we fail to find slots for the batch

ggml-ci

* llama : update docs about llama_decode

ggml-ci

* kv-cache : update warning logs when no space for the batch is available

ggml-ci

* llama : add llama_kv_self_seq_pos_min()

* kv-cache : keep track of partial SWA computes and print warnings

* server : disallow use cases involving partial SWA context

ggml-ci

* llama : add param to control SWA cache size

ggml-ci

* minor : clean-up

ggml-ci

common/arg.cpp		diff \| blob \| history
common/common.cpp		diff \| blob \| history
common/common.h		diff \| blob \| history
include/llama.h		diff \| blob \| history
src/llama-context.cpp		diff \| blob \| history
src/llama-graph.cpp		diff \| blob \| history
src/llama-graph.h		diff \| blob \| history
src/llama-hparams.h		diff \| blob \| history
src/llama-kv-cache.cpp		diff \| blob \| history
src/llama-kv-cache.h		diff \| blob \| history
src/llama-memory.h		diff \| blob \| history
src/llama-model.cpp		diff \| blob \| history
src/llama-model.h		diff \| blob \| history
tools/llama-bench/llama-bench.cpp		diff \| blob \| history
tools/server/server.cpp		diff \| blob \| history