server: update readme to mention n_past_max metric (#16436)

author Oleksandr Kuvshynov <redacted>

Mon, 6 Oct 2025 07:53:31 +0000 (03:53 -0400)

committer GitHub <redacted>

Mon, 6 Oct 2025 07:53:31 +0000 (10:53 +0300)
author Oleksandr Kuvshynov <redacted>
Mon, 6 Oct 2025 07:53:31 +0000 (03:53 -0400)
committer GitHub <redacted>
Mon, 6 Oct 2025 07:53:31 +0000 (10:53 +0300)
diff --git a/tools/server/README.md b/tools/server/README.md

index 9f7ab229f7ddf5594e61a599bb28b29125116bc6..6825c8bf300c6887504b251cffc03587a0a35966 100644 (file)
--- a/tools/server/README.md
+++ b/tools/server/README.md
@@ -1045,6 +1045,7 @@ Available metrics:
  - `llamacpp:kv_cache_tokens`: KV-cache tokens.
  - `llamacpp:requests_processing`: Number of requests processing.
  - `llamacpp:requests_deferred`: Number of requests deferred.
+- `llamacpp:n_past_max`: High watermark of the context size observed.
  
  ### POST `/slots/{id_slot}?action=save`: Save the prompt cache of the specified slot to a file.
author	Oleksandr Kuvshynov <redacted>
	Mon, 6 Oct 2025 07:53:31 +0000 (03:53 -0400)
committer	GitHub <redacted>
	Mon, 6 Oct 2025 07:53:31 +0000 (10:53 +0300)