git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Jeff Bolz <redacted>
	Sat, 6 Dec 2025 10:12:26 +0000 (04:12 -0600)
committer	GitHub <redacted>
	Sat, 6 Dec 2025 10:12:26 +0000 (11:12 +0100)
commit	2960eb2975da26f0d0b1e30cb6e09f25fe3dac0e
tree	3c87b8ddc2e339e1d6bca1314ebbd9a229e81ee3	tree
parent	dbc15a79672e72e0b9c1832adddf3334f5c9229c	commit \| diff

vulkan: Use one row per workgroup for f32 mmv (#17711)

The MoE models have a mul_mat_vec with very small m (32, 64, 128) right before
the topk_moe selection. Running multiple rows per wg doesn't utilize the SMs
well. I think even for larger m, f32 is so bandwidth-limited that running
multiple rows doesn't help.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

diff | blob | history

Packaging of ggml-org/llama.cpp

RSS Atom