]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
Fix ffn_down quantization mix for MoE models (#4927)
authorKawrakow <redacted>
Sun, 14 Jan 2024 08:53:39 +0000 (10:53 +0200)
committerGitHub <redacted>
Sun, 14 Jan 2024 08:53:39 +0000 (10:53 +0200)
commita128c38de862431f1aae9ccc40b792fbc1b8b682
tree2946ef20e083b883c325fed2bc0a11d1ca84166d
parent5f5fe1bd608fa2ed42af97b5f2ea31be6625fc48
Fix ffn_down quantization mix for MoE models (#4927)

* Fix ffn_down quantization mix for MoE models

In #4872 I did not consider the part where every third
tensor is quantized with more bits. Fir MoE this leads to tensors
of the same layer being quantized with different number of bits,
which is not considered as a possibility in the inference implementation
(it is assumed all experts use the same quantization).

* Fix the fix

* Review suggestion

---------

Co-authored-by: Iwan Kawrakow <redacted>
llama.cpp