]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server : refactor slot input data, move tokenizer to HTTP thread (#10023)
authorXuan Son Nguyen <redacted>
Thu, 24 Oct 2024 19:51:22 +0000 (21:51 +0200)
committerGitHub <redacted>
Thu, 24 Oct 2024 19:51:22 +0000 (21:51 +0200)
commit958367bf530d943a902afa1ce1c342476098576b
tree2388735e8c1c8db054ccfa4a3f27dfee79b74852
parent40f2555797f97314de749873cdc29dc102be66e2
server : refactor slot input data, move tokenizer to HTTP thread (#10023)

* server : refactor slot input data, move tokenizer to HTTP thread

* move prompt_tokens.empty() check

* fix incorrect if branch

* fix infinite generation loop

* bring back infill validation

* add infill test

* try fixing format_infill

* fix test

* remove redundant code

* rename completion to inference

* update docs

* use llama_tokens everywhere
examples/server/README.md
examples/server/server.cpp
examples/server/tests/features/infill.feature [new file with mode: 0644]
examples/server/tests/features/steps/steps.py
examples/server/utils.hpp