]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (llama/15872)
authorOliver Simons <redacted>
Wed, 10 Sep 2025 20:04:03 +0000 (22:04 +0200)
committerGeorgi Gerganov <redacted>
Sat, 20 Sep 2025 10:33:50 +0000 (13:33 +0300)
commit7a9132fa61a82af1ffc67c9b165c97cff9526c66
treef13d741e3aa33df60132e009264a41421b3e52cd
parentf783517566a10a0c15748aaf5119167e9543288b
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (llama/15872)

* Add fastdiv and fastmodulo to k_bin_bcast kernel

* Address review comments

* `prod_` instead of `prod` suffix

* Add test case for `k_bin_bcast_unravel` in CUDA backend
src/ggml-cuda/binbcast.cu
tests/test-backend-ops.cpp