]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
speculative : implement stochastic speculative sampling (#5625)
authorMinsoo Cheong <redacted>
Mon, 4 Mar 2024 18:24:00 +0000 (03:24 +0900)
committerGitHub <redacted>
Mon, 4 Mar 2024 18:24:00 +0000 (20:24 +0200)
commit6d341ab6c53cd51f2921d986d0090cc8b049b39a
treef212b497e210c8c73fe52369f6bc81297c7b1dab
parent4ffcdce2ff877ebb683cd217ea38faf20faa5ffe
speculative : implement stochastic speculative sampling (#5625)

* (WIP) Implement stochastic speculative decoding

* sample from residual distribution on draft accept failure

* fix #5657: force greedy sampling with probs when temp is 0

* remove p_accept parameter

* fix style

* remove unused variables

* add srand() in speculative.cpp

* replace use of rand() with mt19937 sampling

* fixes based on review (@JohannesGaessler)

* fix r random generation

* randomly select next sequence to verify + fix bug in memory freeing

* fix bug in active_seqs sync

* fix uniform int distribution initialization

* remove warnings from comparison between int and size_t

* check grammar in `llama_sample_probability_distribution_impl`

* remove malloc code by utilizing vectors

* add PR link to README
common/common.cpp
common/common.h
common/sampling.cpp
common/sampling.h
examples/speculative/README.md
examples/speculative/speculative.cpp