server: update cache_prompt documentation [no ci] (#7745)

author Johannes Gäßler <redacted>

Fri, 7 Jun 2024 09:15:49 +0000 (11:15 +0200)

committer GitHub <redacted>

Fri, 7 Jun 2024 09:15:49 +0000 (11:15 +0200)
author Johannes Gäßler <redacted>
Fri, 7 Jun 2024 09:15:49 +0000 (11:15 +0200)
committer GitHub <redacted>
Fri, 7 Jun 2024 09:15:49 +0000 (11:15 +0200)
diff --git a/examples/server/README.md b/examples/server/README.md

index 0c3db8c84c69d0ffbc766b76bee88d338c943ba5..ccbdcdbdb2ddb2cdbb3f78e86057ac40d64b1c00 100644 (file)
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -279,7 +279,7 @@ node index.js
  
      `id_slot`: Assign the completion task to an specific slot. If is -1 the task will be assigned to a Idle slot.  Default: `-1`
  
-    `cache_prompt`: Re-use previously cached prompt from the last request if possible. This may prevent re-caching the prompt from scratch.  Default: `false`
+    `cache_prompt`: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are **not** guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: `false`
  
      `system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime)
author	Johannes Gäßler <redacted>
	Fri, 7 Jun 2024 09:15:49 +0000 (11:15 +0200)
committer	GitHub <redacted>
	Fri, 7 Jun 2024 09:15:49 +0000 (11:15 +0200)