]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server : include speculative decoding stats when timings_per_token is enabled (#12603)
authorBenson Wong <redacted>
Fri, 28 Mar 2025 08:05:44 +0000 (01:05 -0700)
committerGitHub <redacted>
Fri, 28 Mar 2025 08:05:44 +0000 (10:05 +0200)
commit5d01670266859444366e4f333ade5e0e5e2ae63d
treedb1c2d5e27c2e06dd46a765c0aca1943ed79c790
parentef03229ff423dd1991f4f44ef1352f03334d86eb
server : include speculative decoding stats when timings_per_token is enabled (#12603)

* Include speculative decoding stats when timings_per_token is true

New fields added to the `timings` object:

  - draft_n           : number of draft tokens generated
  - draft_accepted_n  : number of draft tokens accepted
  - draft_accept_ratio: ratio of accepted/generated

* Remove redundant draft_accept_ratio var

* add draft acceptance rate to server console output
examples/server/server.cpp