]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
ggml : use full range for Q4_0 and Q4_2 quantization (#729)
authorunbounded <redacted>
Tue, 25 Apr 2023 17:20:46 +0000 (19:20 +0200)
committerGitHub <redacted>
Tue, 25 Apr 2023 17:20:46 +0000 (20:20 +0300)
commitdd0eabc049fb1efc631cab8eb0a646808d704e18
tree23a35354481ec346c4501937b95612a19fff9d21
parent54bb60e26858be251a0eb3cb70f80322aff804a0
ggml : use full range for Q4_0 and Q4_2 quantization (#729)

* Use full range for q4_0 quantization

By keeping the sign of the highest magnitude, we can make sure the
highest value maps to -8, which is currently unused.
This is a bit of a freebie since it is fully backwards compatible with
the current format.

* Update quantize_row_q4_0 for AVX/AVX2

* Update quantize_row_q4_0 for WASM

Untested

* Update quantize_row_q4_0 for Arm NEON

* Update quantize_row_q4_0 for PowerPC

Untested

* Use full range for q4_2 quantization
ggml.c