]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
common : preallocate sampling token data vector (#8363)
authorKevin Wang <redacted>
Mon, 8 Jul 2024 07:26:53 +0000 (03:26 -0400)
committerGitHub <redacted>
Mon, 8 Jul 2024 07:26:53 +0000 (10:26 +0300)
commit470939d483d1c89b7292f78bac1fd27c42c171ce
treeff372b23b8c68deb41fac67f26e7bb4af60a6a63
parent6f0dbf6ab087bcd286fb78560099ca0458316735
common : preallocate sampling token data vector (#8363)

`emplace_back` repeatedly-called is slower than preallocating the vector to the vocab size and directly inserting the data. Some rudimentary profiling with `chrono` improves the performance of this block of code from ~500us/op to ~40us/op.

Overall, this slightly improves the sampling performance which has a more substantial impact for the `examples/lookahead` implementation -- I am able to see a ~10% performance boost in lookahead inference.
common/sampling.cpp