]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commitdiff
llama-bench : clarify benchmarked parts of the computation (#16823)
authorGeorgi Gerganov <redacted>
Tue, 28 Oct 2025 17:41:43 +0000 (19:41 +0200)
committerGitHub <redacted>
Tue, 28 Oct 2025 17:41:43 +0000 (19:41 +0200)
tools/llama-bench/README.md

index ead4da45e2957427507a794e484d1a2b896dba6f..87d9c0a219bd82878989e95c41ec10fa823ce842 100644 (file)
@@ -82,6 +82,9 @@ Using the `-d <n>` option, each test can be run at a specified context depth, pr
 
 For a description of the other options, see the [main example](../main/README.md).
 
+> [!NOTE]
+> The measurements with `llama-bench` do not include the times for tokenization and for sampling.
+
 ## Examples
 
 ### Text generation with different models
@@ -131,7 +134,7 @@ $ ./llama-bench -n 0 -n 16 -p 64 -t 1,2,4,8,16,32
 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | pp 64      |     33.52 ± 0.03 |
 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | tg 16      |     15.32 ± 0.05 |
 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | pp 64      |     59.00 ± 1.11 |
-| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | tg 16      |     16.41 ± 0.79 ||
+| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | tg 16      |     16.41 ± 0.79 |
 
 ### Different numbers of layers offloaded to the GPU