]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
vulkan: Multi-pass softmax for large number of cols (#17892)
authorJeff Bolz <redacted>
Sat, 13 Dec 2025 09:04:29 +0000 (03:04 -0600)
committerGitHub <redacted>
Sat, 13 Dec 2025 09:04:29 +0000 (10:04 +0100)
commit303f8615e94d74f140eb4c3947758c1eca933c3a
tree02743107d213c6af9b097d599029936a35ec42fd
parent3c6391e748d8c00c45fba033811508288580cdc7
vulkan: Multi-pass softmax for large number of cols (#17892)

When the number of cols is large, split each row across multiple workgroups.
There are three phases that communicate partial results through temp buffers:
(1) compute max partials
(2) take max of partials, compute sum(exp(x-max)) partials
(3) sum partials, compute scaled result
ggml/src/ggml-vulkan/ggml-vulkan.cpp
ggml/src/ggml-vulkan/vulkan-shaders/soft_max_large1.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/soft_max_large2.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/soft_max_large3.comp [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/soft_max_large_common.glsl [new file with mode: 0644]
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
tests/test-backend-ops.cpp