server : update doc to clarify n_keep when there is bos token (#8619)

author Jan Boon <redacted>

Mon, 22 Jul 2024 08:02:09 +0000 (16:02 +0800)

committer GitHub <redacted>

Mon, 22 Jul 2024 08:02:09 +0000 (11:02 +0300)
author Jan Boon <redacted>
Mon, 22 Jul 2024 08:02:09 +0000 (16:02 +0800)
committer GitHub <redacted>
Mon, 22 Jul 2024 08:02:09 +0000 (11:02 +0300)
diff --git a/examples/server/README.md b/examples/server/README.md

index e477d1501f976f67549eefff30a4ce90122f4726..ff4074517f9f551cdfe9de827d5ae783892ddf88 100644 (file)
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -444,7 +444,7 @@ node index.js
  
      `n_predict`: Set the maximum number of tokens to predict when generating text. **Note:** May exceed the set limit slightly if the last token is a partial multibyte character. When 0, no tokens will be generated but the prompt is evaluated into the cache. Default: `-1`, where `-1` is infinity.
  
-    `n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded.
+    `n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
      By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.
  
      `stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.