]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
iq2_xxs: tune quantization (#5320)
authorKawrakow <redacted>
Mon, 5 Feb 2024 08:46:06 +0000 (10:46 +0200)
committerGitHub <redacted>
Mon, 5 Feb 2024 08:46:06 +0000 (10:46 +0200)
commit6fdfa2ecc684000a25a4ad91823bc82a6652b645
treec98969391003efff3b83b4ede0a50759b80fa3ab
parenta2d60c9158435ae9a6f14632f07f1acf7a3becef
iq2_xxs: tune quantization (#5320)

We get slightly better PPL, and we cut quantization time in
nearly half.

The trick is to 1st quantize without forcing points onto the E8-lattice.
We can then use a narrower search range around the block scale that we
got that way.

Co-authored-by: Iwan Kawrakow <redacted>
ggml-quants.c