]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
iq2_xxs: tune quantization (llama/5320)
authorKawrakow <redacted>
Mon, 5 Feb 2024 08:46:06 +0000 (10:46 +0200)
committerGeorgi Gerganov <redacted>
Sat, 10 Feb 2024 07:55:46 +0000 (09:55 +0200)
commit0ed762d691cb6a211b7af6496b3ebaa70e1b848a
tree5558d0f1cc14e83ad79aed59ddc4657b0b2c627d
parent1b5bb7792e9fea541dec1e3430a559f8de28f3c8
iq2_xxs: tune quantization (llama/5320)

We get slightly better PPL, and we cut quantization time in
nearly half.

The trick is to 1st quantize without forcing points onto the E8-lattice.
We can then use a narrower search range around the block scale that we
got that way.

Co-authored-by: Iwan Kawrakow <redacted>
ggml-quants.c