git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Georgi Gerganov <redacted>
	Fri, 28 Mar 2025 18:21:59 +0000 (20:21 +0200)
committer	GitHub <redacted>
	Fri, 28 Mar 2025 18:21:59 +0000 (20:21 +0200)
commit	b4ae50810e4304d052e630784c14bde7e79e4132
tree	79c7765d85d83e231ae4dd8c1461df5c937a2536	tree
parent	b86f6007234da4bff51a3ebef2bdb952b52059c6	commit \| diff

metal : improve FA + improve MoE (#12612)

* ggml : FA with different K, V head sizes (CPU)

ggml-ci

* metal : add FA with HS=192

* metal : extend FA to support different K and V head sizes

ggml-ci

* metal : add FA vector kernels for heads K 192 and V 128

ggml-ci

* ggml : restrict op on other backends to equal head sizes

ggml-ci

* metal : optimize FA-vec kernel

ggml-ci

* metal : FA remove mq registers

* metal : improve MoE mul_mat_id condition

ggml-ci

* metal : fix comments + remove unnecessary addition

ggml-ci

* metal : avoid too much shared memory usage with mul_mat_id

ggml-ci

ggml/include/ggml.h		diff \| blob \| history
ggml/src/ggml-cpu/ggml-cpu.c		diff \| blob \| history
ggml/src/ggml-cuda/ggml-cuda.cu		diff \| blob \| history
ggml/src/ggml-metal/ggml-metal-impl.h		diff \| blob \| history
ggml/src/ggml-metal/ggml-metal.m		diff \| blob \| history
ggml/src/ggml-metal/ggml-metal.metal		diff \| blob \| history
ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml.c		diff \| blob \| history
src/llama-context.cpp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history