git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

overview / pkg / ggml / sources / whisper.cpp / commit

author	Oliver Simons <redacted>
	Thu, 30 Oct 2025 03:34:15 +0000 (04:34 +0100)
committer	Georgi Gerganov <redacted>
	Sun, 9 Nov 2025 21:38:03 +0000 (23:38 +0200)
commit	41f4daca57a6c4e5a10479c3813f7d126625bbdb
tree	bb32e471865358b8f6e67332c34a3dc671e69059	tree
parent	efe80992687e55340b03d3fb7381432b2bb1203d	commit \| diff

Hide latency of bias and gate-loading (llama/16847)

This is realised by loading them into registers before computation of
the dot-product, effectively batching them together with said
dot-product. As a lot of threads are alive here, the warp scheduler has
enough threads available to effectively hide the cost of additionally
loading those two floats.

ggml/src/ggml-cuda/mmvq.cu

diff | blob | history

Packaging of ggerganov/whisper.cpp

RSS Atom