git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Kawrakow <redacted>
	Mon, 5 Feb 2024 08:46:06 +0000 (10:46 +0200)
committer	GitHub <redacted>
	Mon, 5 Feb 2024 08:46:06 +0000 (10:46 +0200)
commit	6fdfa2ecc684000a25a4ad91823bc82a6652b645
tree	c98969391003efff3b83b4ede0a50759b80fa3ab	tree
parent	a2d60c9158435ae9a6f14632f07f1acf7a3becef	commit \| diff

iq2_xxs: tune quantization (#5320)

We get slightly better PPL, and we cut quantization time in
nearly half.

The trick is to 1st quantize without forcing points onto the E8-lattice.
We can then use a narrower search range around the block scale that we
got that way.

Co-authored-by: Iwan Kawrakow <redacted>

ggml-quants.c

diff | blob | history

Packaging of ggml-org/llama.cpp

RSS Atom