git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Daniel Bevenius <redacted>
	Wed, 20 Aug 2025 12:26:01 +0000 (14:26 +0200)
committer	GitHub <redacted>
	Wed, 20 Aug 2025 12:26:01 +0000 (14:26 +0200)
commit	657b8a77bd01854f99d37a47318fa24f2e7e298f
tree	0817ab9e34ad487c67a5134d3492bdcbc687f20d	tree
parent	ec5ab1a36c11dd3efcf4ec8d1ac89a13a8117bc3	commit \| diff

chat: handle gpt-oss return/end token inconsistency (#15421)

This commit addresses an inconsistency during inference by adding a new
member to the `templates_params` struct to indicate whether the chat is
in inference mode. This allows the gpt-oss specific function
`common_chat_params_init_gpt_oss` to check this flag and the
`add_generation_prompt` flag to determine if it should replace the
`<|return|>` token with the `<|end|>` token in the prompt.

The motivation for this change is to ensure that the formatted prompt of
past messages in `common_chat_format_single` matches the output of the
formatted new message. The issue is that the gpt-oss template returns
different end tags: `<|return|>` when `add_generation_prompt` is false,
and `<|end|>` when `add_generation_prompt` is true. This causes the
substring function to start at an incorrect position, resulting in
tokenization starting with 'tart|>' instead of '<|start|>'.

Resolves: https://github.com/ggml-org/llama.cpp/issues/15417