server: fix correct time_ms calculation in prompt_progress (#17093)

author Aidan <redacted>

Sat, 8 Nov 2025 13:12:11 +0000 (13:12 +0000)

committer GitHub <redacted>

Sat, 8 Nov 2025 13:12:11 +0000 (15:12 +0200)
author Aidan <redacted>
Sat, 8 Nov 2025 13:12:11 +0000 (13:12 +0000)
committer GitHub <redacted>
Sat, 8 Nov 2025 13:12:11 +0000 (15:12 +0200)
diff --git a/tools/server/README.md b/tools/server/README.md

index 6828ef73824ceaccb468d1a4add88c3e97a3ac6e..8fd478eb328a4a6c21b61976d21c59bf935845e7 100644 (file)
--- a/tools/server/README.md
+++ b/tools/server/README.md
@@ -512,7 +512,7 @@ These words will not be included in the completion, so make sure to add them to
  
  `timings_per_token`: Include prompt processing and text generation speed information in each response.  Default: `false`
  
-`return_progress`: Include prompt processing progress in `stream` mode. The progress will be contained inside `prompt_progress` with 3 values: `total`, `cache` and `processed`. The overall progress is `processed/total`, while the actual timed progress is `(processed-cache)/(total-cache)`. Default: `false`
+`return_progress`: Include prompt processing progress in `stream` mode. The progress will be contained inside `prompt_progress` with 4 values: `total`, `cache`, `processed`, and `time_ms`. The overall progress is `processed/total`, while the actual timed progress is `(processed-cache)/(total-cache)`. The `time_ms` field contains the elapsed time in milliseconds since prompt processing started. Default: `false`
  
  `post_sampling_probs`: Returns the probabilities of top `n_probs` tokens after applying sampling chain.
  
diff --git a/tools/server/server.cpp b/tools/server/server.cpp

index 164e8cf4e70848bee8babd48b225c5e6e2b07cac..9d91e32d1fbfbc3e4b901463fa9a5fbcc88ff2c9 100644 (file)
--- a/tools/server/server.cpp
+++ b/tools/server/server.cpp
@@ -3078,7 +3078,7 @@ struct server_context {
              res->progress.total     = slot.task->n_tokens();
              res->progress.cache     = slot.n_prompt_tokens_cache;
              res->progress.processed = slot.prompt.tokens.size();
-            res->progress.time_ms   = (ggml_time_us() - slot.t_start_process_prompt / 1000);
+            res->progress.time_ms   = (ggml_time_us() - slot.t_start_process_prompt) / 1000;
          } else {
              res->content = tkn.text_to_send;
              res->tokens  = { tkn.tok };
author	Aidan <redacted>
	Sat, 8 Nov 2025 13:12:11 +0000 (13:12 +0000)
committer	GitHub <redacted>
	Sat, 8 Nov 2025 13:12:11 +0000 (15:12 +0200)
tools/server/README.md		patch \| blob \| history
tools/server/server.cpp		patch \| blob \| history