git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Kawrakow <redacted>
	Mon, 22 Jan 2024 10:43:33 +0000 (12:43 +0200)
committer	GitHub <redacted>
	Mon, 22 Jan 2024 10:43:33 +0000 (12:43 +0200)
commit	66d575c45c5a370d668f9c3283cdf348e2329fa2
tree	035e052b116f301508225f897f1943e6eb1b3e19	tree
parent	57744932c64266359ee905518de7e096c0295d8c	commit \| diff

llama : add Q3_K_XS (#5060)

* Add Q3_K_XS - intermediate size between Q2_K and Q3_K_S

* Q3_K_XS: quanize first 1/8 of ffn_down layers with Q4_K

Together with an importance matrix, this brings perplexity
for LLaMA-v2-70B below the perplexity of the former Q2_K
with a 800 MB smaller quantized model size.

---------

Co-authored-by: Iwan Kawrakow <redacted>

examples/quantize/quantize.cpp		diff \| blob \| history
llama.cpp		diff \| blob \| history
llama.h		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom