All else being otherwise equal, this encourages the beam candidate
selection to re-use the same decoder, which slightly
reduces the cache size.
I wouldn't expect it to make much of a performance difference,
but it helps when debug printing the cache and beam.
Added as part of understanding #1941.
beam_candidates.begin(),
beam_candidates.end(),
[](const beam_candidate & a, const beam_candidate & b) {
- return a.sequence.sum_logprobs_all > b.sequence.sum_logprobs_all;
+ if (a.sequence.sum_logprobs_all != b.sequence.sum_logprobs_all) {
+ return a.sequence.sum_logprobs_all > b.sequence.sum_logprobs_all;
+ }
+ return a.decoder_idx < b.decoder_idx;
});
uint32_t cur_c = 0;