git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	fairydreaming <redacted>
	Fri, 24 May 2024 12:31:13 +0000 (14:31 +0200)
committer	GitHub <redacted>
	Fri, 24 May 2024 12:31:13 +0000 (14:31 +0200)
commit	fbca2f27fc7fa9aa4a8ad0357478fdb908472908
tree	9226fa114f6e0f6578c6946f5a23c7ab76ef0854	tree
parent	0df0aa8e43c3378975269a51f9b876c8692e70da	commit \| diff

Add support for ArcticForCausalLM (#7020)

* common : increase max number of experts to 128

* common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn

* gguf-py : add architecture-specific block mappings that override selected general block mappings

* convert-hf : add model conversion support for ArcticForCausalLM

* convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM)

* llama : add inference support for LLM_ARCH_ARCTIC

---------

Co-authored-by: Stanisław Szymczyk <redacted>

convert-hf-to-gguf.py		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/tensor_mapping.py		diff \| blob \| history
llama.cpp		diff \| blob \| history