]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server : implement universal assisted decoding (#12635)
authorg2mt <redacted>
Thu, 31 Jul 2025 12:25:23 +0000 (05:25 -0700)
committerGitHub <redacted>
Thu, 31 Jul 2025 12:25:23 +0000 (14:25 +0200)
commit94933c8c2eeaa9a7983e3f6c08af76bd86724094
treee5ef9e80253a3488d2c65ebdcaa25e6e692a819a
parentc1dacaa99b4ead6edbac928cd2da59436573f6b0
server : implement universal assisted decoding (#12635)

* llama-server : implement universal assisted decoding

* Erase prompt tail for kv-cache

* set vocab_dft_compatible in common_speculative

* rename ctx_main to ctx_tgt

* move vocab_dft_compatible to spec struct

* clear mem_dft, remove mem

* detokenize id_last for incompatible models

* update comment

* add --spec-replace flag

* accept special tokens when translating between draft/main models

* Escape spec-replace

* clamp draft result to size to params.n_draft

* fix comment

* clean up code

* restore old example

* log common_speculative_are_compatible in speculative example

* fix

* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
common/arg.cpp
common/common.h
common/speculative.cpp
common/speculative.h
examples/speculative-simple/speculative-simple.cpp
tools/server/server.cpp