]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
Add more tokenizer tests (#3742)
authorGalunid <redacted>
Tue, 24 Oct 2023 07:17:17 +0000 (09:17 +0200)
committerGitHub <redacted>
Tue, 24 Oct 2023 07:17:17 +0000 (09:17 +0200)
commitdaab3d7f45832e10773c99f3484b0d5b14d86c0c
tree432092c5aec7c775ab6e33b968564bd0a1e4a187
parent469c9addef75893e6be12edda852d12e840bf064
Add more tokenizer tests (#3742)

* Add more tokenizer tests

* Add starcoder

* Update test vocab files

* Restrict bpe tokenizer tests to unicode planes

* Update comment

* Comment cosmetics

* Remove bloom vocab/test
models/ggml-vocab-baichuan.gguf [new file with mode: 0644]
models/ggml-vocab-gpt-neox.gguf [new file with mode: 0644]
models/ggml-vocab-refact.gguf [new file with mode: 0644]
models/ggml-vocab-starcoder.gguf [new file with mode: 0644]
tests/CMakeLists.txt
tests/test-tokenizer-1-bpe.cpp