git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Jeff Bolz <redacted>
	Fri, 5 Dec 2025 21:03:19 +0000 (15:03 -0600)
committer	GitHub <redacted>
	Fri, 5 Dec 2025 21:03:19 +0000 (22:03 +0100)
commit	a0f3897d53e0e956982ca23abb0d381fe71722f8
tree	ec40e46281da49b783a3eba47472193fb166fbdd	tree
parent	e15cd06a94fce1fafe68f44db01ca69963623df4	commit \| diff

vulkan: fix top_k bug when there are ties in the input (#17659)

* vulkan: Reduce temporary memory usage for TOP_K

- Compute row size for the temp buffer based on the output of the first pass.
- Update shader addressing math to use the output row size
- Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k"

For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer
from about 3.2MB to 500KB.

* vulkan: fix top_k bug when there are ties in the input

I noticed by inspection a bug in the vulkan top_k shader where if the least
value in the top_k appears multiple times we could end up writing those extra
copies out rather than some larger values (if the larger values are on higher
numbered threads).

I rewrote the test verification to handle this case, where the final index set
is not necessarily the same.

* Update tests/test-backend-ops.cpp

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/topk_nary_search.comp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history