git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

overview / pkg / ggml / sources / whisper.cpp / commit

author	Jeff Bolz <redacted>
	Wed, 19 Mar 2025 07:26:26 +0000 (02:26 -0500)
committer	Georgi Gerganov <redacted>
	Thu, 27 Mar 2025 09:06:03 +0000 (11:06 +0200)
commit	102af79f63cb1f7e51842085cc7b9dbd44d49a40
tree	cb353d933cce2abaa32bce0d8a6badbb5b62ac13	tree
parent	03c364557d0585d39f22886fbe16e0da231e4454	commit \| diff

vulkan: Submit once enough matmul work has been recorded (llama/12406)

I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

diff | blob | history

Packaging of ggerganov/whisper.cpp

RSS Atom