git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Salvatore Mesoraca <redacted>
	Mon, 30 Sep 2024 07:14:09 +0000 (09:14 +0200)
committer	Georgi Gerganov <redacted>
	Tue, 1 Oct 2024 13:07:39 +0000 (16:07 +0300)
commit	cb00020504416606601f8cb35f55ee07b710b4b7
tree	7e5220dffcded49b2e49dd683a25b06f3b757733	tree
parent	6c5322481a75f1be54e58a000c3f78484d07f948	commit \| diff

vulkan : mul_mat: fix UB with small warps (ggml/952)

When the device's warp size is less than 16,
it is possible for loadstride_a (mul_mm.comp:114)
and loadstride_b (mul_mm.comp:115) to be set to 0.
Because they are calculated as: the workgroup size,
multiplied by LOAD_VEC_* (which can be 1) and divided by 16.
And the workgroup size is set to be the same as the
warp/subgroup size.

The loadstride_* variables are used as increments in the
loops that populate the buffers used for the multiplication.

When they are 0 they cause an infinite loop.
But infinite loops without side-effects are UB and the
values of loadstride_* are known at compile time.
So, the compiler quietly optimizes all the loops away.
As a consequence, the buffers are not populated and
the multiplication result is just a matrix with all elements
set to 0.

We prevent the UB by making sure that the workgroup size
will never be less than 16, even if our device has a
smaller warp size (e.g. 8).

Signed-off-by: Salvatore Mesoraca <redacted>