git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Daniel Bevenius <redacted>
	Mon, 24 Nov 2025 20:06:17 +0000 (21:06 +0100)
committer	GitHub <redacted>
	Mon, 24 Nov 2025 20:06:17 +0000 (21:06 +0100)
commit	134e6940caf5c64071b7f3b7bc6c2f32f1b3a5a4
tree	26449105175e4cf01ebbd15713fa57cb5c94c342	tree
parent	0543f928a3ae576e6e16d3bbf02c0bf9fddba688	commit \| diff

llama : skip output reordering for single token batches (#17466)

This commit adds a check to skip the output reordering logic when
n_outputs == 1. With a single output token, the data is trivially
sorted and the reordering code is currently doing unnecessary work
(resetting and rebuilding output_ids to the same values).

The motivation for this change is improved code clarity and avoiding
confusion when debugging. While the performance impact is probably
negligible, this unnecessary work happens on every decode call in
llama-server when processing batches with single-token outputs.