]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : add qwen2moe (#6074)
authorShijie <redacted>
Tue, 16 Apr 2024 15:40:48 +0000 (23:40 +0800)
committerGitHub <redacted>
Tue, 16 Apr 2024 15:40:48 +0000 (18:40 +0300)
commitf4dea7da1841a92d2788b0535063abf2f0e28461
treec7a729d974e4315c71c78eea84fa08dda920b649
parent8a56075b07a8b571bf95a912ffdce4c928c2b414
llama : add qwen2moe (#6074)

* support qwen2moe

* fix-review

* metal : support unary ops for nelements % 4 != 0

* metal : require contiguousness for float4 unary kernels

* metal : require contiguousness for float4 unary kernels (cont)

* fix-review

* names : for brevity "SHARED_EXP" -> "SHEXP"

* llama : reuse build_moe_ffn()

* llama : add model type name

---------

Co-authored-by: Georgi Gerganov <redacted>
convert-hf-to-gguf.py
ggml-metal.m
ggml-metal.metal
gguf-py/gguf/constants.py
gguf-py/gguf/tensor_mapping.py
llama.cpp
tests/test-backend-ops.cpp