git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Christian Fillion <redacted>
	Fri, 7 Feb 2025 13:55:47 +0000 (08:55 -0500)
committer	GitHub <redacted>
	Fri, 7 Feb 2025 13:55:47 +0000 (15:55 +0200)
commit	2d219b389e8c8c40bce547b08c8aa7add60fde1f
tree	b6a4bd015886666f1f455391b16750fb26292c65	tree
parent	333820d7491cd31c707a340ff23b984a84e40154	commit \| diff

vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729)

Silently insert U+FFFD(s) (Unicode replacement character) instead until the
next valid codepoint can be found.

This fixes `llama_tokenize` throwing an exception across the C API boundary
or libllama's module boundary (the caller's runtime might be incompatible!)

Returing a proper error code might be desirable, however the signature
of `llama_tokenize` doesn't allow it as all return values already have
existing meaning.

src/unicode.cpp

diff | blob | history

Packaging of ggml-org/llama.cpp

RSS Atom