server : add cache reuse card link to help (#13230)

author Georgi Gerganov <redacted>

Fri, 2 May 2025 06:48:31 +0000 (09:48 +0300)

committer GitHub <redacted>

Fri, 2 May 2025 06:48:31 +0000 (09:48 +0300)
author Georgi Gerganov <redacted>
Fri, 2 May 2025 06:48:31 +0000 (09:48 +0300)
committer GitHub <redacted>
Fri, 2 May 2025 06:48:31 +0000 (09:48 +0300)
diff --git a/common/arg.cpp b/common/arg.cpp

index e35417de7eb98b351f9c8db13f2df8bc46bc6101..aface844c931998d9b2ddac3ed5b65dce28f06ce 100644 (file)
--- a/common/arg.cpp
+++ b/common/arg.cpp
@@ -2783,7 +2783,10 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
      ).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_THREADS_HTTP"));
      add_opt(common_arg(
          {"--cache-reuse"}, "N",
-        string_format("min chunk size to attempt reusing from the cache via KV shifting (default: %d)", params.n_cache_reuse),
+        string_format(
+            "min chunk size to attempt reusing from the cache via KV shifting (default: %d)\n"
+            "[(card)](https://ggml.ai/f0.png)", params.n_cache_reuse
+        ),
          [](common_params & params, int value) {
              params.n_cache_reuse = value;
          }
diff --git a/examples/server/README.md b/examples/server/README.md

index a2a0903261e31fc2d554e8d5221f72796299c75d..61446a0ba2a078c7f442349352cd663e9b1a93bc 100644 (file)
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -154,7 +154,7 @@ The project is under active development, and we are [looking for feedback and co
  | `--ssl-cert-file FNAME` | path to file a PEM-encoded SSL certificate<br/>(env: LLAMA_ARG_SSL_CERT_FILE) |
  | `-to, --timeout N` | server read/write timeout in seconds (default: 600)<br/>(env: LLAMA_ARG_TIMEOUT) |
  | `--threads-http N` | number of threads used to process HTTP requests (default: -1)<br/>(env: LLAMA_ARG_THREADS_HTTP) |
-| `--cache-reuse N` | min chunk size to attempt reusing from the cache via KV shifting (default: 0)<br/>(env: LLAMA_ARG_CACHE_REUSE) |
+| `--cache-reuse N` | min chunk size to attempt reusing from the cache via KV shifting (default: 0)<br/>[(card)](https://ggml.ai/f0.png)<br/>(env: LLAMA_ARG_CACHE_REUSE) |
  | `--metrics` | enable prometheus compatible metrics endpoint (default: disabled)<br/>(env: LLAMA_ARG_ENDPOINT_METRICS) |
  | `--slots` | enable slots monitoring endpoint (default: disabled)<br/>(env: LLAMA_ARG_ENDPOINT_SLOTS) |
  | `--props` | enable changing global properties via POST /props (default: disabled)<br/>(env: LLAMA_ARG_ENDPOINT_PROPS) |
author	Georgi Gerganov <redacted>
	Fri, 2 May 2025 06:48:31 +0000 (09:48 +0300)
committer	GitHub <redacted>
	Fri, 2 May 2025 06:48:31 +0000 (09:48 +0300)
common/arg.cpp		patch \| blob \| history
examples/server/README.md		patch \| blob \| history