git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	Kawrakow <redacted>
	Wed, 28 Feb 2024 08:37:02 +0000 (10:37 +0200)
committer	Georgi Gerganov <redacted>
	Wed, 28 Feb 2024 09:18:32 +0000 (11:18 +0200)
commit	a04cf9e6812539fc49d59c6dfb4c231f333e56ef
tree	6597619a0b097c0999e56627f8703b0041835079	tree
parent	962a11591e05aa8e85c16938d34f7df665ff756c	commit \| diff

ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (llama/5760)

* WIP: make i-quants work for QK_K = 64

* iq2_xs: attempt to fix AVX dot product for QK_K = 64

Tests pass, but I get gibberish.

* QK_K = 64 tests pass on ARM_NEON and Metal

Sadly, that does not mean it actually works.

* Make CUDA compile with QK_K = 64

Tests don't pass, plus we get misaligned access

* Q2_K: fixed bug in imatrix quantization for QK_K = 64

* iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work)

---------

Co-authored-by: Iwan Kawrakow <redacted>

src/ggml-cuda.cu		diff \| blob \| history
src/ggml-metal.metal		diff \| blob \| history
src/ggml-quants.c		diff \| blob \| history
src/ggml-quants.h		diff \| blob \| history
src/ggml.c		diff \| blob \| history