]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama-cli: prevent spurious assistant token (#16202)
authorVinkal <redacted>
Mon, 29 Sep 2025 07:03:12 +0000 (12:33 +0530)
committerGitHub <redacted>
Mon, 29 Sep 2025 07:03:12 +0000 (10:03 +0300)
commit2f61c0f5bf8a620ca4c3872408803ab38cfb9613
tree3503d93413dd3cc47e99a2f03fac8516135abc26
parent3ffd0fae473c954bb3e67526b31262048fb508d4
llama-cli: prevent spurious assistant token (#16202)

* tools/main: llama-cli: prevent spurious assistant token (#13402)

During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece.

Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged.

Fixes #13402.

Signed-off-by: Vinkal Chudgar <redacted>
* Update tools/main/main.cpp

Co-authored-by: Sigbjørn Skjæret <redacted>
* tools/main: remove outdated comment

Signed-off-by: Vinkal Chudgar <redacted>
---------

Signed-off-by: Vinkal Chudgar <redacted>
Co-authored-by: Sigbjørn Skjæret <redacted>
tools/main/main.cpp