]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm)...
authorAlberto Cabrera Pérez <redacted>
Fri, 23 Jan 2026 07:55:08 +0000 (07:55 +0000)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 13:56:40 +0000 (15:56 +0200)
commite090d91f5edd636f42c0a09f155ca8a341af50d4
tree96bd84cc91bbb1e9e6522eb490a8730a776c1606
parent3f96a1da0e89e538f508fb641422f5de83bb4dc4
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (llama/18860)

* Boilerplate for q5_Kx8 REPACK on ARM and fallback

Signed-off-by: Alberto Cabrera <redacted>
* Implements make_block_q5_Kx8 by extending make_block_q4_Kx8

Signed-off-by: Alberto Cabrera <redacted>
* q5_K repack gemm and gemv generics

* Gemm and Gemv ARM implementations (i8mm)

* Improved qh manipulation looking at non-repack vec_dot implementation

* Full unroll

* Apply Q5_K Gemv vand and vshl optimizations to gemm. Improve comments.

Signed-off-by: Alberto Cabrera <redacted>
* Fix wrong fallback definitions of Q5_K

Signed-off-by: Alberto Cabrera <redacted>
* Fixed comments. Reverted unnecessary formatting

Signed-off-by: Alberto Cabrera <redacted>
* Fixed typo in generic definitions

* Switching AND + Shift with Shift Insert. Better op interleaving.

* Vectorize + unroll the block scales

* Apply gemm optimizations to gemv

* Improve bias calculation

---------

Signed-off-by: Alberto Cabrera <redacted>
ggml/src/ggml-cpu/arch-fallback.h
ggml/src/ggml-cpu/arch/arm/repack.cpp
ggml/src/ggml-cpu/repack.cpp
ggml/src/ggml-cpu/repack.h