]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from...
authormatteo <redacted>
Sun, 29 Jun 2025 18:02:53 +0000 (20:02 +0200)
committerGitHub <redacted>
Sun, 29 Jun 2025 18:02:53 +0000 (20:02 +0200)
commitcaf5681fcb47dfe9bafee94ef9aa8f669ac986c7
tree803f1dac071f96e86a7625bfd615be009d5dd346
parent83790b0e7e09ab17238b16452a33053a71dbdfad
server : support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client (#13196)

* initial commit for handling extra template kwargs

* enable_thinking and assistant prefill cannot be enabled at the same time

* can set chat_template_kwargs in command line

* added doc

* fixed formatting

* add support for extra context in generic template init

* coding standard: common/chat.cpp

Co-authored-by: Georgi Gerganov <redacted>
* coding standard:  common/chat.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Apply suggestions from code review

coding standard: cosmetic changes

Co-authored-by: Georgi Gerganov <redacted>
* fix merge conflict

* chat.cpp: simplify calls to apply to ensure systematic propagation of extra_context (+ the odd existing additional_context)

* normalize environment variable name

* simplify code

* prefill cannot be used with thinking models

* compatibility with the new reasoning-budget parameter

* fix prefill for non thinking models

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Olivier Chafik <redacted>
common/arg.cpp
common/chat.cpp
common/chat.h
common/common.h
tools/server/README.md
tools/server/server.cpp
tools/server/utils.hpp