git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Jeff Bolz <redacted>
	Wed, 19 Nov 2025 16:25:50 +0000 (10:25 -0600)
committer	GitHub <redacted>
	Wed, 19 Nov 2025 16:25:50 +0000 (17:25 +0100)
commit	1fa4551af069358e29fe4c497c801b0dee85cb49
tree	c8a4082fb3f3ac0d26015a445d2aefe24fae9c76	tree
parent	2eba631b8127a5a4853ea625a0eac4a7449bc7b8	commit \| diff

vulkan: support larger argsort (#17313)

* vulkan: support larger argsort

This is an extension of the original bitonic sorting shader that puts the
temporary values in global memory and when more than 1024 threads are needed
it runs multiple workgroups and synchronizes through a pipelinebarrier.

To improve the memory access pattern, a copy of the float value is kept with
the index value. I've applied this same change to the original shared memory
version of the shader, which is still used when ncols <= 1024.

* Reduce the number of shader variants. Use smaller workgroups when doing a single pass, for a modest perf boost

* reduce loop overhead

* run multiple cols per invocation, to reduce barrier overhead

ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/argsort.comp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/argsort_large.comp	[new file with mode: 0644]	blob
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history