]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : implement Unigram tokenizer needed by T5 and FLAN-T5 model families (#5763)
authorfairydreaming <redacted>
Tue, 25 Jun 2024 19:14:35 +0000 (21:14 +0200)
committerGitHub <redacted>
Tue, 25 Jun 2024 19:14:35 +0000 (21:14 +0200)
commit6fcbf6823553efabe52ed83e3c2a3329aa3387d1
tree33c314dc90d54a39f0f0883bb84410c28b7272a8
parente6bf007744eb06336a231ef39cf08146dd16d2ce
llama : implement Unigram tokenizer needed by T5 and FLAN-T5 model families (#5763)

* llama : add T5 model architecture, tensors and model header parameters

* llama : add implementation of Unigram tokenizer with SentencePiece-like text normalization using precompiled charsmap

---------

Co-authored-by: Stanisław Szymczyk <redacted>
llama.cpp
llama.h
unicode.cpp
unicode.h