]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server: benchmark: chat/completions scenario and other llm servers comparison (#5941)
authorPierrick Hymbert <redacted>
Sat, 9 Mar 2024 22:41:49 +0000 (23:41 +0100)
committerGitHub <redacted>
Sat, 9 Mar 2024 22:41:49 +0000 (23:41 +0100)
commit621e86b331f8b0e71f79fd82a4ae1cd54c3e4396
treee667aa693df722aafbb5452054de261839d0dac1
parent77d1ac7e00bf049b9f2bba1b5a310a78318c49c4
server: benchmark: chat/completions scenario and other llm servers comparison (#5941)

* server: bench: Init a bench scenario with K6
See #5827

* server: bench: EOL EOF

* server: bench: PR feedback and improved k6 script configuration

* server: bench: remove llamacpp_completions_tokens_seconds as it include prompt processing time and it's misleading

server: bench: add max_tokens from SERVER_BENCH_MAX_TOKENS

server: bench: increase truncated rate to 80% before failing

* server: bench: fix doc

* server: bench: change gauge custom metrics to trend

* server: bench: change gauge custom metrics to trend
server: bench: add trend custom metrics for total tokens per second average

* server: bench: doc add an option to debug http request

* server: bench: filter dataset too short and too long sequences

* server: bench: allow to filter out conversation in the dataset based on env variable

* server: bench: fix assistant message sent instead of user message

* server: bench: fix assistant message sent instead of user message

* server : add defrag thold parameter

* server: bench: select prompts based on the current iteration id not randomly to make the bench more reproducible

---------

Co-authored-by: Georgi Gerganov <redacted>
examples/server/bench/README.md [new file with mode: 0644]
examples/server/bench/script.js [new file with mode: 0644]
examples/server/server.cpp