git.djapps.eu Git - pkg/ggml/sources/ggml/commit

]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit

overview / pkg / ggml / sources / ggml / commit

author	Jeff Bolz <redacted>
	Sat, 6 Dec 2025 10:12:26 +0000 (04:12 -0600)
committer	Georgi Gerganov <redacted>
	Thu, 11 Dec 2025 13:32:57 +0000 (15:32 +0200)
commit	695e3534b561292fc951035d41feb774312764b9
tree	7e707608ef3a7a4d3abf6693c907cbbb0195ba48	tree
parent	0124b66315ed3875e48c9071dac4f0289b27b85b	commit \| diff

vulkan: Use one row per workgroup for f32 mmv (llama/17711)

The MoE models have a mul_mat_vec with very small m (32, 64, 128) right before
the topk_moe selection. Running multiple rows per wg doesn't utilize the SMs
well. I think even for larger m, f32 is so bandwidth-limited that running
multiple rows doesn't help.

src/ggml-vulkan/ggml-vulkan.cpp

diff | blob | history

Packaging of ggml-org/ggml

RSS Atom