From: Radoslav Gerganov Date: Fri, 28 Mar 2025 07:44:13 +0000 (+0200) Subject: rpc : update README for cache usage (#12620) X-Git-Tag: upstream/0.0.5028~45 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=ef03229ff423dd1991f4f44ef1352f03334d86eb;p=pkg%2Fggml%2Fsources%2Fllama.cpp rpc : update README for cache usage (#12620) --- diff --git a/examples/rpc/README.md b/examples/rpc/README.md index 312bb634..561f19fd 100644 --- a/examples/rpc/README.md +++ b/examples/rpc/README.md @@ -72,3 +72,14 @@ $ bin/llama-cli -m ../models/tinyllama-1b/ggml-model-f16.gguf -p "Hello, my name This way you can offload model layers to both local and remote devices. +### Local cache + +The RPC server can use a local cache to store large tensors and avoid transferring them over the network. +This can speed up model loading significantly, especially when using large models. +To enable the cache, use the `-c` option: + +```bash +$ bin/rpc-server -c +``` + +By default, the cache is stored in the `$HOME/.cache/llama.cpp/rpc` directory and can be controlled via the `LLAMA_CACHE` environment variable.