]>
git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (llama/18888)
* Boilerplate for q6_K repack
* q6_K repack to q6_Kx8 implementation
Signed-off-by: Alberto Cabrera <redacted>
* q6_K generic gemv and gemm
* wip, gemm_q6_K 8x8
* Still WIP: loading of q8s, q6h and q6l
* first working version of q6_K gemm
* Moved q6 loads outside of sb block, Unrolled inner loop
* Replaced modulo with mask
* First implementation of GEMV
* ggml_vdotq_s32 -> vdotq_s32
* Reduce width of accumulators in q6_K gemv
* Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.
* Reuse scales in GEMM (same GEMV opt)
* Added todos for bsum and different qh repack
* Arch fallback
* VSLIQ for merging qh adn ql
* Removed TODO, already tested
* Apply suggestions
Co-authored-by: Georgi Gerganov <redacted>
* Removed unused import
---------
Signed-off-by: Alberto Cabrera <redacted>
Co-authored-by: Georgi Gerganov <redacted>