git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	Abhijit Ramesh <redacted>
	Tue, 3 Mar 2026 03:35:11 +0000 (19:35 -0800)
committer	Georgi Gerganov <redacted>
	Sun, 15 Mar 2026 19:50:13 +0000 (21:50 +0200)
commit	18efe2c5527dca44bfcaa93e4dbf0ed7df07563a
tree	e19054a0dffb4da9c42f5f53eabea1db88480820	tree
parent	2e93b2897eeb46c27cd2075f869e8b61f275b421	commit \| diff

ggml webgpu: fix workgroup dispatch limit for large batch sizes (llama/19965)

* ggml-webgpu: fix workgroup dispatch limit for large batch sizes

WebGPU limits workgroup sizes to 65535 per dimension. Large MUL_MAT
operations with batch sizes exceedeing this limi would fail.

* add compute_2d_workgroups() helper to split total workgroup ID across
X/Y dimensions

* update mul_mat_reg_tile.wgsl to reconstruct linear workgroup ID from 2D
   dispatch

* update mul_mat_subgroup_matrix.wgsl to reconstruct linear workgroup ID
  from 2D dispatch

* update mul_mat.wgsl to compute global index from 2D workgroup
  coordinates

* refactor all three mul_mat dispatch paths to use the shared helper

* ggml-webgpu: add bounds checking for over-dispatched workgroups

2D workgroup dispatch can over-dispatch when total workgroups don't
divide evenly into the 65535 per-dimension limit. Extra workgroups
would compute invalid batch indices, causing memory corruption.

* add batch_idx bound check to mul_mat_reg_tile.wgsl and
mul_mat_subgroup_matrix.wgsl to prevent over-dispatched workgroups
from accessing invalid memory

* fixes test failures with large batch sizes (eg., bs=[128, 1024])

* ggml-webgpu: add back TODO for spliting large sizes into batches

* Optimize 2d workgroup provisioning

* Set some parameters that increase speed

---------

Co-authored-by: Reese Levine <redacted>

src/ggml-webgpu/ggml-webgpu.cpp		diff \| blob \| history
src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl		diff \| blob \| history
src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl		diff \| blob \| history
src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl		diff \| blob \| history