]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (llama/15872)
authorOliver Simons <redacted>
Wed, 10 Sep 2025 20:04:03 +0000 (22:04 +0200)
committerGeorgi Gerganov <redacted>
Sat, 20 Sep 2025 10:45:27 +0000 (13:45 +0300)
commitf5ef0e25e2a5f4ed2f5fa46807efb1671d09a276
tree37fc303ef82af69a8fc1db6f146f40fd321c727a
parent3617008c37ff5763c43e63e555480f4397ecac18
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (llama/15872)

* Add fastdiv and fastmodulo to k_bin_bcast kernel

* Address review comments

* `prod_` instead of `prod` suffix

* Add test case for `k_bin_bcast_unravel` in CUDA backend
ggml/src/ggml-cuda/binbcast.cu