]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: Optimize PAD_REFLECT_1D (llama/15957)
authorBowen Han <redacted>
Thu, 18 Sep 2025 18:26:03 +0000 (11:26 -0700)
committerGeorgi Gerganov <redacted>
Sat, 20 Sep 2025 10:33:50 +0000 (13:33 +0300)
commit7cadebf54dc74bf8c81661d36a2f4066aacfb64e
tree21be7dc6988e4afd6aca0074b339d22b1ce41fcb
parent7eaada8a7e94df1b7da0b52a7938ace3d02f43dd
CUDA: Optimize PAD_REFLECT_1D (llama/15957)

* CUDA: Optimize PAD_REFLECT_1D
feat: add more test cases for PAD_REFLECT_1D

* use fast_div to improve performance

* Apply suggestion from JohannesGaessler

Co-authored-by: Johannes Gäßler <redacted>
* Apply suggestion from JohannesGaessler

Co-authored-by: Johannes Gäßler <redacted>
* optimize

* use a concise expression to further speedup the cuda kernel

---------

Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/common.cuh
src/ggml-cuda/pad_reflect_1d.cu
tests/test-backend-ops.cpp