git.djapps.eu Git - pkg/ggml/sources/ggml/commit

]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit

overview / pkg / ggml / sources / ggml / commit

author	Oliver Simons <redacted>
	Thu, 30 Oct 2025 03:34:15 +0000 (04:34 +0100)
committer	Georgi Gerganov <redacted>
	Sat, 1 Nov 2025 07:41:35 +0000 (09:41 +0200)
commit	a66c5912d3ac6c6b463522fddd8d3a48c17dd8e4
tree	320de1d1bf6febbe1df8e68392591fea841919c6	tree
parent	0f0fd00536b9d5d953ad132b0c6c6a0e014e7cee	commit \| diff

Hide latency of bias and gate-loading (llama/16847)

This is realised by loading them into registers before computation of
the dot-product, effectively batching them together with said
dot-product. As a lot of threads are alive here, the warp scheduler has
enough threads available to effectively hide the cost of additionally
loading those two floats.

src/ggml-cuda/mmvq.cu

diff | blob | history

Packaging of ggml-org/ggml

RSS Atom