From: Vinkal Date: Mon, 29 Sep 2025 07:03:12 +0000 (+0530) Subject: llama-cli: prevent spurious assistant token (#16202) X-Git-Tag: upstream/0.0.6641~17 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=2f61c0f5bf8a620ca4c3872408803ab38cfb9613;p=pkg%2Fggml%2Fsources%2Fllama.cpp llama-cli: prevent spurious assistant token (#16202) * tools/main: llama-cli: prevent spurious assistant token (#13402) During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece. Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged. Fixes #13402. Signed-off-by: Vinkal Chudgar * Update tools/main/main.cpp Co-authored-by: Sigbjørn Skjæret * tools/main: remove outdated comment Signed-off-by: Vinkal Chudgar --------- Signed-off-by: Vinkal Chudgar Co-authored-by: Sigbjørn Skjæret --- diff --git a/tools/main/main.cpp b/tools/main/main.cpp index 083fc0cf..498e00e3 100644 --- a/tools/main/main.cpp +++ b/tools/main/main.cpp @@ -707,6 +707,10 @@ int main(int argc, char ** argv) { embd.push_back(id); + if (params.conversation_mode && !waiting_for_first_input && !llama_vocab_is_eog(vocab, id)) { + assistant_ss << common_token_to_piece(ctx, id, false); + } + // echo this to console input_echo = true; @@ -824,11 +828,7 @@ int main(int argc, char ** argv) { } } - // if current token is not EOG, we add it to current assistant message if (params.conversation_mode && !waiting_for_first_input) { - const auto id = common_sampler_last(smpl); - assistant_ss << common_token_to_piece(ctx, id, false); - if (!prompt.empty()) { prompt.clear(); is_interacting = false;