git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Jeff Bolz <redacted>
	Fri, 20 Mar 2026 11:17:15 +0000 (06:17 -0500)
committer	GitHub <redacted>
	Fri, 20 Mar 2026 11:17:15 +0000 (12:17 +0100)
commit	e06c3ab2bc7e45df7584468014681349fceccfbc
tree	6a39fda3b1e0aa3c526c39a878702547f535d227	tree
parent	dc6592431b909208040c1a8e953e6c5440471eaa	commit \| diff

vulkan: change gated_delta_net to shard a column across a subgroup (#20662)

* vulkan: change gated_delta_net to shard a column across a subgroup

This is based on https://github.com/ggml-org/llama.cpp/pull/20391, I used an
LLM to port the CUDA code to Vulkan, and guided to it to make various fixes to
work with Vulkan (e.g. handling different subgroup sizes, unknown mapping of
subgroup to invocation id, using subgroupAdd optionally, etc.).

This fixes a perf regression from the transposing of the values in memory
(!20443).

* vulkan: Spread columns across fewer lanes to reduce the number of workgroups

ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/gated_delta_net.comp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp		diff \| blob \| history