]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)
authorOlivier Chafik <redacted>
Sun, 25 May 2025 00:48:08 +0000 (01:48 +0100)
committerGitHub <redacted>
Sun, 25 May 2025 00:48:08 +0000 (01:48 +0100)
commitf5cd27b71da3ac375a04a41643d14fc779a8057b
treeaa9137a0060559eac27cb8232b644cf089aeaa3c
parenta2d02d5793fd9af7a7224773456501691b95fd02
`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379)

* add common_json w/ support for truncated json healing

* add common_chat_msg_diff

* partial common_chat_parse

* refactor parser w/ optionals

* server: wire chat diffs in stream mode

* fix trigger of thinking models (must happen after thoughts are closed)

* fix functionary v3.2 raw python!

* rename: common_chat_syntax (now contains format)

* rm common_regex.at_start

* don't return empty <think></think>

* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)

* fix QwQ 32B tool call parsing after thoughts (hermes2)

* better logs for grammar triggers

* consume spaces after parse_json_tool_calls

* fix required tool calls w/ thinking models that have pre-opened thinking tags

* fix thinking model's initial trigger + test qwq's template

* run most test_tool_call tests in stream + non-stream modes

* make functionary v3.2 parsing more strict (differentiate first match from others)

* send final diff from server, to close off raw python arguments

* support partial content streaming in Generic mode

* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)

* Update function-calling.md

* Update tool_bench.py

* chat-parser: remove input from exception (llm output may contain PII)

---------

Co-authored-by: ochafik <redacted>
Co-authored-by: Olivier Chafik <redacted>
23 files changed:
common/CMakeLists.txt
common/chat-parser.cpp [new file with mode: 0644]
common/chat-parser.h [new file with mode: 0644]
common/chat.cpp
common/chat.h
common/common.h
common/json-partial.cpp [new file with mode: 0644]
common/json-partial.h [new file with mode: 0644]
common/sampling.cpp
docs/function-calling.md
models/templates/Qwen-QwQ-32B.jinja [new file with mode: 0644]
models/templates/README.md
scripts/tool_bench.py
src/llama-grammar.cpp
tests/CMakeLists.txt
tests/test-chat-parser.cpp [new file with mode: 0644]
tests/test-chat.cpp
tests/test-json-partial.cpp [new file with mode: 0644]
tools/server/server.cpp
tools/server/tests/unit/test_chat_completion.py
tools/server/tests/unit/test_tool_call.py
tools/server/tests/utils.py
tools/server/utils.hpp