]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm)...
authorAlberto Cabrera Pérez <redacted>
Tue, 27 Jan 2026 09:08:10 +0000 (09:08 +0000)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 13:56:40 +0000 (15:56 +0200)
commit00885e08e2fbb52eb8190f2554d5b2179e967733
tree0b7c56b312672a6a17afe15a6d0250ca631712e9
parent5fcbbdc0ddda79214fe40d828fc338a4f28c29ff
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (llama/18888)

* Boilerplate for q6_K repack

* q6_K repack to q6_Kx8 implementation

Signed-off-by: Alberto Cabrera <redacted>
* q6_K generic gemv and gemm

* wip, gemm_q6_K 8x8

* Still WIP: loading of q8s, q6h and q6l

* first working version of q6_K gemm

* Moved q6 loads outside of sb block, Unrolled inner loop

* Replaced modulo with mask

* First implementation of GEMV

* ggml_vdotq_s32 -> vdotq_s32

* Reduce width of accumulators in q6_K gemv

* Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.

* Reuse scales in GEMM (same GEMV opt)

* Added todos for bsum and different qh repack

* Arch fallback

* VSLIQ for merging qh adn ql

* Removed TODO, already tested

* Apply suggestions

Co-authored-by: Georgi Gerganov <redacted>
* Removed unused import

---------

Signed-off-by: Alberto Cabrera <redacted>
Co-authored-by: Georgi Gerganov <redacted>
ggml/src/ggml-cpu/arch-fallback.h
ggml/src/ggml-cpu/arch/arm/repack.cpp
ggml/src/ggml-cpu/repack.cpp
ggml/src/ggml-cpu/repack.h