]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872)
authorOliver Simons <redacted>
Wed, 10 Sep 2025 20:04:03 +0000 (22:04 +0200)
committerGitHub <redacted>
Wed, 10 Sep 2025 20:04:03 +0000 (22:04 +0200)
commit00681dfc16ba4cebb9c7fbd2cf2656e06a0692a4
tree05fe98480a3d3d2ec07b9e3853c1befcc5722972
parent4f658855fa8f2e42b7ed9a5b298fa39a2e39b096
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#15872)

* Add fastdiv and fastmodulo to k_bin_bcast kernel

* Address review comments

* `prod_` instead of `prod` suffix

* Add test case for `k_bin_bcast_unravel` in CUDA backend
ggml/src/ggml-cuda/binbcast.cu
tests/test-backend-ops.cpp