git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Georgi Gerganov <redacted>
	Sun, 27 Aug 2023 11:19:19 +0000 (14:19 +0300)
committer	GitHub <redacted>
	Sun, 27 Aug 2023 11:19:19 +0000 (14:19 +0300)
commit	edd4c1481708fcd788b0e423268304fd26e2b125
tree	2e7db62ea4816dc18f2518a08c36b6ea480eff05	tree
parent	1591e2e590762011b43b10a9b6e04f13f98f2aa5	commit \| diff

llama : more tokenizer fixes (#2810)

* tests : write a Python tokenizer test (wip)

* llama : prefix input text for tokenization with whitespace

* llama : distinguish pieces from decoded text + fix detokenization

* common : add comments

* examples : no longer manually add leading space when tokenizing

* tests : use Python to generate tokenizer tests for C++

* tests : add option to tokenize text files

ggml-ci

* tests : add test-tokenizer-1.py

* llama.cpp : fix LF token

* hellaswag : move the concat space for clarity

* tests : add falcon tests (py + cpp, currently do not pass Unicode)

ggml-ci

* common : temporary separate llama_detokenize calls for SPM and BPE

---------

Co-authored-by: klosax <redacted>

20 files changed:

common/common.cpp		diff \| blob \| history
common/common.h		diff \| blob \| history
examples/beam_search/beam_search.cpp		diff \| blob \| history
examples/embd-input/embd-input-lib.cpp		diff \| blob \| history
examples/embedding/embedding.cpp		diff \| blob \| history
examples/main/main.cpp		diff \| blob \| history
examples/perplexity/perplexity.cpp		diff \| blob \| history
examples/save-load-state/save-load-state.cpp		diff \| blob \| history
examples/server/server.cpp		diff \| blob \| history
examples/simple/simple.cpp		diff \| blob \| history
examples/train-text-from-scratch/train-text-from-scratch.cpp		diff \| blob \| history
llama.cpp		diff \| blob \| history
llama.h		diff \| blob \| history
tests/CMakeLists.txt		diff \| blob \| history
tests/test-tokenizer-0-falcon.cpp	[new file with mode: 0644]	blob
tests/test-tokenizer-0-falcon.py	[new file with mode: 0644]	blob
tests/test-tokenizer-0-llama.cpp	[new file with mode: 0644]	blob
tests/test-tokenizer-0-llama.py	[new file with mode: 0644]	blob
tests/test-tokenizer-0.cpp	[deleted file]	blob \| history
tests/test-tokenizer-1.cpp		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom