]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server : export max observed n_past value (#15361)
authorOleksandr Kuvshynov <redacted>
Sun, 17 Aug 2025 22:28:58 +0000 (18:28 -0400)
committerGitHub <redacted>
Sun, 17 Aug 2025 22:28:58 +0000 (00:28 +0200)
commite5155e698645242d4f019267ecc40ea9bad81b09
treee483220d69f49c76ba7255b19e83f4d4019c138a
parent21c17b5befc5f6be5992bc87fc1ba99d388561df
server : export max observed n_past value (#15361)

Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.
tools/server/server.cpp