git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Aman Gupta <redacted>
	Sat, 27 Sep 2025 16:49:32 +0000 (00:49 +0800)
committer	GitHub <redacted>
	Sat, 27 Sep 2025 16:49:32 +0000 (18:49 +0200)
commit	c0bfc57af421f8fd63c946c13b7666aed82560e2
tree	ca98e12f8f641ec8cb28008968ec82ed69ff7d94	tree
parent	75a3a6c2cd0002ba40e2dcc92007bc9fdbc69f1a	commit \| diff

CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277)

* CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32

This commit adds mul_mat_id support for ncols_dst >= 16. It does this by
packing ncols_dst tiles into the blockDim.y.

My tests on a RTX 3090 show that this is faster than the cuBLAS fallback
for f16 till bs=64, and for f32 till bs=32

* Review: refactor if statement

ggml/src/ggml-cuda/ggml-cuda.cu		diff \| blob \| history
ggml/src/ggml-cuda/mmf.cu		diff \| blob \| history
ggml/src/ggml-cuda/mmf.cuh		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom