]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
Use `tokenizer.vocab_size()` instead of hardcoding 32000 in convert-pth-to-ggml.py...
authorRonsor <redacted>
Wed, 15 Mar 2023 19:37:50 +0000 (12:37 -0700)
committerGitHub <redacted>
Wed, 15 Mar 2023 19:37:50 +0000 (21:37 +0200)
commit956dfda8ad8cea7961e22e0384bbc315bf79aed2
tree57210ba963ca22ecab007fe2841f02100ad423a8
parent113e685d18ac4edb20f647fd34b000941556f6a6
Use `tokenizer.vocab_size()` instead of hardcoding 32000 in convert-pth-to-ggml.py (#142)

There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
convert-pth-to-ggml.py