From: Johannes Gäßler Date: Fri, 7 Jun 2024 09:15:49 +0000 (+0200) Subject: server: update cache_prompt documentation [no ci] (#7745) X-Git-Tag: upstream/0.0.4488~1382 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=7027b27d765db95d4ac6b569d976e387a8715881;p=pkg%2Fggml%2Fsources%2Fllama.cpp server: update cache_prompt documentation [no ci] (#7745) --- diff --git a/examples/server/README.md b/examples/server/README.md index 0c3db8c8..ccbdcdbd 100644 --- a/examples/server/README.md +++ b/examples/server/README.md @@ -279,7 +279,7 @@ node index.js `id_slot`: Assign the completion task to an specific slot. If is -1 the task will be assigned to a Idle slot. Default: `-1` - `cache_prompt`: Re-use previously cached prompt from the last request if possible. This may prevent re-caching the prompt from scratch. Default: `false` + `cache_prompt`: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are **not** guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: `false` `system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime)