]>
git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (llama/5760)
* WIP: make i-quants work for QK_K = 64
* iq2_xs: attempt to fix AVX dot product for QK_K = 64
Tests pass, but I get gibberish.
* QK_K = 64 tests pass on ARM_NEON and Metal
Sadly, that does not mean it actually works.
* Make CUDA compile with QK_K = 64
Tests don't pass, plus we get misaligned access
* Q2_K: fixed bug in imatrix quantization for QK_K = 64
* iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work)
---------
Co-authored-by: Iwan Kawrakow <redacted>