]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm)...
authorAlberto Cabrera Pérez <redacted>
Fri, 23 Jan 2026 07:55:08 +0000 (07:55 +0000)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 11:49:29 +0000 (13:49 +0200)
commit6a7ff597f9c8d24035f6b09e571a5e2a647bca25
tree2ad616fe9082792ff13055c28603adb5001fa235
parent91d417c5954c1cb45d83795f23c8b458730731c1
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (llama/18860)

* Boilerplate for q5_Kx8 REPACK on ARM and fallback

Signed-off-by: Alberto Cabrera <redacted>
* Implements make_block_q5_Kx8 by extending make_block_q4_Kx8

Signed-off-by: Alberto Cabrera <redacted>
* q5_K repack gemm and gemv generics

* Gemm and Gemv ARM implementations (i8mm)

* Improved qh manipulation looking at non-repack vec_dot implementation

* Full unroll

* Apply Q5_K Gemv vand and vshl optimizations to gemm. Improve comments.

Signed-off-by: Alberto Cabrera <redacted>
* Fix wrong fallback definitions of Q5_K

Signed-off-by: Alberto Cabrera <redacted>
* Fixed comments. Reverted unnecessary formatting

Signed-off-by: Alberto Cabrera <redacted>
* Fixed typo in generic definitions

* Switching AND + Shift with Shift Insert. Better op interleaving.

* Vectorize + unroll the block scales

* Apply gemm optimizations to gemv

* Improve bias calculation

---------

Signed-off-by: Alberto Cabrera <redacted>
src/ggml-cpu/arch-fallback.h
src/ggml-cpu/arch/arm/repack.cpp
src/ggml-cpu/repack.cpp
src/ggml-cpu/repack.h