git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Oliver Simons <redacted>
	Thu, 30 Oct 2025 03:34:15 +0000 (04:34 +0100)
committer	GitHub <redacted>
	Thu, 30 Oct 2025 03:34:15 +0000 (11:34 +0800)
commit	8b11deea4663f29d3e042ce1056ba643264cd5f1
tree	d53ace1542e3f0ca58f57db0e62ae2bfa5ce2a85	tree
parent	b9ce94017729465895402cbcfffb51fa926c15e3	commit \| diff

Hide latency of bias and gate-loading (#16847)

This is realised by loading them into registers before computation of
the dot-product, effectively batching them together with said
dot-product. As a lot of threads are alive here, the warp scheduler has
enough threads available to effectively hide the cost of additionally
loading those two floats.

ggml/src/ggml-cuda/mmvq.cu

diff | blob | history

Packaging of ggml-org/llama.cpp

RSS Atom