]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)
authoramritahs-ibm <redacted>
Thu, 27 Mar 2025 06:51:47 +0000 (12:21 +0530)
committerGeorgi Gerganov <redacted>
Thu, 27 Mar 2025 07:35:24 +0000 (09:35 +0200)
commit32aca4608b0519f4c10ff71b5c3bcc2c19374a29
tree3390339eb728f876c4d5e79d5958134f28b7cda1
parentd3f81d0d7b804c8831d0e276cd065bc3ca22b802
llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>
src/ggml-cpu/llamafile/sgemm.cpp