]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
1.5 bit quantization (llama/5453)
authorKawrakow <redacted>
Sun, 18 Feb 2024 16:16:55 +0000 (18:16 +0200)
committerGeorgi Gerganov <redacted>
Mon, 19 Feb 2024 13:52:51 +0000 (15:52 +0200)
commitfb921eba49cfeb6ac44af94e51270c64cd293920
tree8488c5952ace1fb2c203a77504f17a863e350327
parent2f836b6771c5d024c61a98b90d2b441f06f4c3a5
1.5 bit quantization (llama/5453)

* iq1_s: WIP basics

* iq1_s: CUDA is working

* iq1_s: scalar CPU dot product

* iq1_s: WIP AVX2 dot product - something is not right

* Fix tests

* Fix shadow warnings

* Fix after merge with latest master

* iq1_s: AVX2 finally works

* iq1_s: ARM_NEON dot product. Works, but not very fast

* iq1_s: better grid

* iq1_s: use IQ2_XXS for attn_output

At a cost of 0.04 extra bpw this gives a big improvement in PPL.

* iq1_s: Metal basics

Dequantize works, but not dot product

* iq1_s: Metal works, but quite slow

As usual, Apple Silicon does not like the code I write.

* iq1_s: Tests

* iq1_s: slightly faster dot product

---------

Co-authored-by: Iwan Kawrakow <redacted>
include/ggml/ggml.h
src/ggml-backend.c
src/ggml-cuda.cu
src/ggml-metal.m
src/ggml-metal.metal
src/ggml-quants.c
src/ggml-quants.h
src/ggml.c
tests/test-backend-ops.cpp