git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Jeff Bolz <redacted>
	Sun, 21 Dec 2025 09:27:34 +0000 (03:27 -0600)
committer	GitHub <redacted>
	Sun, 21 Dec 2025 09:27:34 +0000 (10:27 +0100)
commit	b365c3ff010256c76fa030a621a4a96fc06a8442
tree	4319e6c82b133a7d24220ed97a577cfbeeedf368	tree
parent	cb64222b0cbad6aa8c5d05837efb315eeff847ac	commit \| diff

vulkan/cuda: fix topk_moe with exp_probs_b (#18071)

I updated test_topk_moe to more closely match llm_graph_context::build_moe_ffn
and added coverage for exp_probs_b and some other missing combinations. This
exposed a bug in both CUDA and Vulkan backends where they were assuming the
input to argsort and the input to get_rows are the same. I'd like to optimize
this graph in another change, but for now just get it functional.

CUDA also had a bug where it got n_experts from the wrong place, leading to
GGML_ASSERT failures in some of the new tests.

ggml/src/ggml-cuda/ggml-cuda.cu		diff \| blob \| history
ggml/src/ggml-cuda/topk-moe.cu		diff \| blob \| history
ggml/src/ggml-cuda/topk-moe.cuh		diff \| blob \| history
ggml/src/ggml-vulkan/ggml-vulkan.cpp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history