]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server: Add cached_tokens info to oaicompat responses (#19361)
authorRyan Goulden <redacted>
Thu, 19 Mar 2026 18:09:33 +0000 (11:09 -0700)
committerGitHub <redacted>
Thu, 19 Mar 2026 18:09:33 +0000 (19:09 +0100)
commit26c9ce128825ba53a91baf75b5b817a1373b46bf
tree4fd83148e0acc7da62a233713e423410e937a249
parent76f2dc70c360d6506c588d68b58ff14d0120ce8b
server: Add cached_tokens info to oaicompat responses (#19361)

* tests : fix fetch_server_test_models.py

* server: to_json_oaicompat cached_tokens

Adds OpenAI and Anthropic compatible information about the
number of cached prompt tokens used in a response.
scripts/fetch_server_test_models.py
tools/server/server-context.cpp
tools/server/server-task.cpp
tools/server/server-task.h
tools/server/tests/unit/test_chat_completion.py
tools/server/tests/unit/test_compat_anthropic.py