]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm)...
authorAlberto Cabrera Pérez <redacted>
Tue, 27 Jan 2026 09:08:10 +0000 (09:08 +0000)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 11:49:29 +0000 (13:49 +0200)
commit8dd386b0d91905fad71a142a4f078cb67f87d668
tree40746dea2c98d3fb8ea7a6f2940c8cb72cfc7ed9
parent2a860ed8256d874bcb5ff1c54fad234c722706fe
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (llama/18888)

* Boilerplate for q6_K repack

* q6_K repack to q6_Kx8 implementation

Signed-off-by: Alberto Cabrera <redacted>
* q6_K generic gemv and gemm

* wip, gemm_q6_K 8x8

* Still WIP: loading of q8s, q6h and q6l

* first working version of q6_K gemm

* Moved q6 loads outside of sb block, Unrolled inner loop

* Replaced modulo with mask

* First implementation of GEMV

* ggml_vdotq_s32 -> vdotq_s32

* Reduce width of accumulators in q6_K gemv

* Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.

* Reuse scales in GEMM (same GEMV opt)

* Added todos for bsum and different qh repack

* Arch fallback

* VSLIQ for merging qh adn ql

* Removed TODO, already tested

* Apply suggestions

Co-authored-by: Georgi Gerganov <redacted>
* Removed unused import

---------

Signed-off-by: Alberto Cabrera <redacted>
Co-authored-by: Georgi Gerganov <redacted>
src/ggml-cpu/arch-fallback.h
src/ggml-cpu/arch/arm/repack.cpp
src/ggml-cpu/repack.cpp
src/ggml-cpu/repack.h