git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	g2mt <redacted>
	Thu, 31 Jul 2025 12:25:23 +0000 (05:25 -0700)
committer	GitHub <redacted>
	Thu, 31 Jul 2025 12:25:23 +0000 (14:25 +0200)
commit	94933c8c2eeaa9a7983e3f6c08af76bd86724094
tree	e5ef9e80253a3488d2c65ebdcaa25e6e692a819a	tree
parent	c1dacaa99b4ead6edbac928cd2da59436573f6b0	commit \| diff

server : implement universal assisted decoding (#12635)

* llama-server : implement universal assisted decoding

* Erase prompt tail for kv-cache

* set vocab_dft_compatible in common_speculative

* rename ctx_main to ctx_tgt

* move vocab_dft_compatible to spec struct

* clear mem_dft, remove mem

* detokenize id_last for incompatible models

* update comment

* add --spec-replace flag

* accept special tokens when translating between draft/main models

* Escape spec-replace

* clamp draft result to size to params.n_draft

* fix comment

* clean up code

* restore old example

* log common_speculative_are_compatible in speculative example

* fix

* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

common/arg.cpp		diff \| blob \| history
common/common.h		diff \| blob \| history
common/speculative.cpp		diff \| blob \| history
common/speculative.h		diff \| blob \| history
examples/speculative-simple/speculative-simple.cpp		diff \| blob \| history
tools/server/server.cpp		diff \| blob \| history