git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Jeff Bolz <redacted>
	Wed, 19 Mar 2025 07:26:26 +0000 (02:26 -0500)
committer	GitHub <redacted>
	Wed, 19 Mar 2025 07:26:26 +0000 (08:26 +0100)
commit	c446b2edd2a9fe2772a1a18923c3e54a6749c364
tree	c88cbbf4169f94418429be0fe1745056c6dad1de	tree
parent	d84635b1b085d54d6a21924e6171688d6e3dfb46	commit \| diff

vulkan: Submit once enough matmul work has been recorded (#12406)

I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

diff | blob | history

Packaging of ggml-org/llama.cpp

RSS Atom