]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
ggml : use 8-bit precision for Q4_1 intermediate results (#1047)
authorGeorgi Gerganov <redacted>
Wed, 19 Apr 2023 17:10:08 +0000 (20:10 +0300)
committerGitHub <redacted>
Wed, 19 Apr 2023 17:10:08 +0000 (20:10 +0300)
commit884e7d7a2bfd7325b107442d6758983f5886ed3d
tree9b3bcda080b127f069092cfc04db151421746754
parent7cd5c4a3e9106151d48f328bb3c94c298a211f18
ggml : use 8-bit precision for Q4_1 intermediate results (#1047)

* ggml : use 8-bit precision for Q4_1 intermediate results (ARM)

* ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32

56 ms/token with Q4_1 !

* ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051)

* gitignore : ignore ppl-*.txt files

---------

Co-authored-by: slaren <redacted>
.gitignore
ggml.c