git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	Jeff Bolz <redacted>
	Wed, 19 Nov 2025 16:25:50 +0000 (10:25 -0600)
committer	Georgi Gerganov <redacted>
	Thu, 11 Dec 2025 13:32:40 +0000 (15:32 +0200)
commit	1e799a51c4d950d25a59f17213c3ef5011e290af
tree	60aa0a0052bd9ea0fced62619fd49f56dde29098	tree
parent	6d22338a50952d5c0abc6d012ed68a74ee112c49	commit \| diff

vulkan: support larger argsort (llama/17313)

* vulkan: support larger argsort

This is an extension of the original bitonic sorting shader that puts the
temporary values in global memory and when more than 1024 threads are needed
it runs multiple workgroups and synchronizes through a pipelinebarrier.

To improve the memory access pattern, a copy of the float value is kept with
the index value. I've applied this same change to the original shared memory
version of the shader, which is still used when ncols <= 1024.

* Reduce the number of shader variants. Use smaller workgroups when doing a single pass, for a modest perf boost

* reduce loop overhead

* run multiple cols per invocation, to reduce barrier overhead

src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
src/ggml-vulkan/vulkan-shaders/argsort.comp		diff \| blob \| history
src/ggml-vulkan/vulkan-shaders/argsort_large.comp	[new file with mode: 0644]	blob
src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history