]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm)...
authorAlberto Cabrera Pérez <redacted>
Tue, 27 Jan 2026 09:08:10 +0000 (09:08 +0000)
committerGitHub <redacted>
Tue, 27 Jan 2026 09:08:10 +0000 (11:08 +0200)
commitbe8890e7217982fc02da568bc3955e9416d0e5d0
tree900cec06fe84c62c1f0657186b76026056811af6
parenta83c73a18aaffba253ffd01e7cd3af41feaf8179
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888)

* Boilerplate for q6_K repack

* q6_K repack to q6_Kx8 implementation

Signed-off-by: Alberto Cabrera <redacted>
* q6_K generic gemv and gemm

* wip, gemm_q6_K 8x8

* Still WIP: loading of q8s, q6h and q6l

* first working version of q6_K gemm

* Moved q6 loads outside of sb block, Unrolled inner loop

* Replaced modulo with mask

* First implementation of GEMV

* ggml_vdotq_s32 -> vdotq_s32

* Reduce width of accumulators in q6_K gemv

* Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.

* Reuse scales in GEMM (same GEMV opt)

* Added todos for bsum and different qh repack

* Arch fallback

* VSLIQ for merging qh adn ql

* Removed TODO, already tested

* Apply suggestions

Co-authored-by: Georgi Gerganov <redacted>
* Removed unused import

---------

Signed-off-by: Alberto Cabrera <redacted>
Co-authored-by: Georgi Gerganov <redacted>
ggml/src/ggml-cpu/arch-fallback.h
ggml/src/ggml-cpu/arch/arm/repack.cpp
ggml/src/ggml-cpu/repack.cpp
ggml/src/ggml-cpu/repack.h