git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	snadampal <redacted>
	Sun, 11 Feb 2024 13:22:33 +0000 (07:22 -0600)
committer	GitHub <redacted>
	Sun, 11 Feb 2024 13:22:33 +0000 (15:22 +0200)
commit	a07d0fee1f05c5c1dc49948ae1a3293db017275f
tree	06614ff1364269493e4853333ced56802abd7284	tree
parent	e4640d8fdf56f14a6db3d092bcd3d2d315cb5d04	commit \| diff

ggml : add mmla kernels for quantized GEMM (#4966)

* ggml: aarch64: implement smmla kernel for q8_0_q8_0 quantized gemm

armv8.2-a and above supports MMLA instructions that have higher
throughput than DOT. this commit adds mmla kernel for
q8_0_q8_0 gemm. The feature is enabled if the platform supports
"__ARM_FEATURE_MATMUL_INT8"

On AWS Graviton3 processors this kernel resulted up to 1.5x
improvement for prompt evaluation throughput compared to the
default sdot kernel.

* ggml: aarch64: implement smmla kernel for q4_0_q8_0 quantized gemm

armv8.2-a and above supports MMLA instructions that have higher
throughput than DOT. this commit adds mmla kernel for
q4_0_q8_0 gemm. The feature is enabled if the platform supports
"__ARM_FEATURE_MATMUL_INT8"

On AWS Graviton3 processors this kernel resulted up to 1.5x
improvement for prompt evaluation throughput compared to the
default sdot kernel.

* ggml: aarch64: implement smmla kernel for q4_1_q8_1 quantized gemm

armv8.2-a and above supports MMLA instructions that have higher
throughput than DOT. this commit adds mmla kernel for
q4_1_q8_1 gemm. The feature is enabled if the platform supports
"__ARM_FEATURE_MATMUL_INT8"

On AWS Graviton3 processors this kernel resulted up to 1.5x
improvement for prompt evaluation throughput compared to the
default sdot kernel.

* ggml: update unit tests for the new vec_dot interface

* llama.cpp: add MATMUL_INT8 capability to system_info

common/common.cpp		diff \| blob \| history
ggml-quants.c		diff \| blob \| history
ggml-quants.h		diff \| blob \| history
ggml.c		diff \| blob \| history
ggml.h		diff \| blob \| history
llama.cpp		diff \| blob \| history
pocs/vdot/q8dot.cpp		diff \| blob \| history
pocs/vdot/vdot.cpp		diff \| blob \| history
tests/test-quantize-fns.cpp		diff \| blob \| history
tests/test-quantize-perf.cpp		diff \| blob \| history