git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

overview / pkg / ggml / sources / whisper.cpp / commit

author	Jeff Bolz <redacted>
	Sat, 6 Dec 2025 10:12:26 +0000 (04:12 -0600)
committer	Georgi Gerganov <redacted>
	Fri, 12 Dec 2025 15:53:20 +0000 (17:53 +0200)
commit	c66c71e9f49180830715f6fe4d1b46c982bcb7c7
tree	0b8504766d9874b3286d9fa8433f20828e89604f	tree
parent	875d8614733338c24d729de9f58df5d374f0f4db	commit \| diff

vulkan: Use one row per workgroup for f32 mmv (llama/17711)

The MoE models have a mul_mat_vec with very small m (32, 64, 128) right before
the topk_moe selection. Running multiple rows per wg doesn't utilize the SMs
well. I think even for larger m, f32 is so bandwidth-limited that running
multiple rows doesn't help.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

diff | blob | history

Packaging of ggerganov/whisper.cpp

RSS Atom