git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	jaime-m-p <redacted>
	Tue, 28 May 2024 19:46:34 +0000 (21:46 +0200)
committer	GitHub <redacted>
	Tue, 28 May 2024 19:46:34 +0000 (21:46 +0200)
commit	02c1ecad07f0e2d2febe8196271bcc64bdc9c006
tree	2208298e9ac6bd0743787d02f35b527f7db47d0b	tree
parent	6bd12ce409f949012935b7d1b15a21ffa473a565	commit \| diff

Tokenizer WPM fixes (#7500)

* Update random test: add_bos_token.
* Update random test: add WPM models for testing.
* Build vocab.special_tokens_cache using vocab token types.
* Fix and improve WPM preprocessing.
- Fix unicode edge case combinations.
- Split by whitspace in the same pass.
* Discard all tokens when no matching found.

llama.cpp		diff \| blob \| history
tests/test-tokenizer-random.py		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom