git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Jeff Bolz <redacted>
	Tue, 19 Nov 2024 07:25:17 +0000 (01:25 -0600)
committer	GitHub <redacted>
	Tue, 19 Nov 2024 07:25:17 +0000 (08:25 +0100)
commit	b3e585988fc65d3a8083c6d94dfc0629f9ce226d
tree	d7075e8f1f8037dd17a8ece5cd3651d83da1e464	tree
parent	557924f22237c76387a39c4db5abae154d57e754	commit \| diff

vulkan: Optimize soft_max (#10301)

* vulkan: Optimize soft_max

Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.

Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.

Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.

* vulkan: Further soft_max optimizations

Restore the workgroup size of 512 case, use it for >1024.

Use unrollable loops for more iteration counts.

ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/soft_max.comp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history