git.djapps.eu Git - pkg/ggml/sources/ggml/commit

]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit

overview / pkg / ggml / sources / ggml / commit

author	Jeff Bolz <redacted>
	Wed, 19 Mar 2025 07:26:26 +0000 (02:26 -0500)
committer	Georgi Gerganov <redacted>
	Thu, 27 Mar 2025 07:35:24 +0000 (09:35 +0200)
commit	7c8c6895127f9fc4a2efecac069628400a41a28c
tree	746454dde2c2447a915569f59862416d4cbcffde	tree
parent	ae8857fe95035b51f501ba0cc1ec2c981a3c5735	commit \| diff

vulkan: Submit once enough matmul work has been recorded (llama/12406)

I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.

src/ggml-vulkan/ggml-vulkan.cpp

diff | blob | history

Packaging of ggml-org/ggml

RSS Atom