]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server: improve speed of speculative decoding (#17808)
authorXuan-Son Nguyen <redacted>
Mon, 8 Dec 2025 13:35:28 +0000 (14:35 +0100)
committerGitHub <redacted>
Mon, 8 Dec 2025 13:35:28 +0000 (14:35 +0100)
commitf896d2c34f7bb502c13986830b3ed7d85aac67d9
tree15ac8a65596761fba6345ddf0f55cd0db949227a
parente4e9c4329c088d3aa97b8c242e18ff79bfe66248
server: improve speed of speculative decoding (#17808)

* server: improve speed of speculative decoding

* fix small draft case

* add link to the PR

* server : fix generation time measurement

* server : fix draft acceptance logs (add SRV_CNT, SLT_CNT macros)

* server : add comment

* add PR to docs

---------

Co-authored-by: Georgi Gerganov <redacted>
tools/server/README-dev.md
tools/server/server-common.h
tools/server/server-context.cpp