]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: Optimize PAD_REFLECT_1D (#15957)
authorBowen Han <redacted>
Thu, 18 Sep 2025 18:26:03 +0000 (11:26 -0700)
committerGitHub <redacted>
Thu, 18 Sep 2025 18:26:03 +0000 (20:26 +0200)
commit38dbdf4c057515ccea9bec0ca2518f86d5e4d28e
tree79fd2e85a4b45ae848ede5d3a1d68edec787b111
parent368560a1e3b9a3bc83af741b0b2bc9e46fb420d2
CUDA: Optimize PAD_REFLECT_1D (#15957)

* CUDA: Optimize PAD_REFLECT_1D
feat: add more test cases for PAD_REFLECT_1D

* use fast_div to improve performance

* Apply suggestion from JohannesGaessler

Co-authored-by: Johannes Gäßler <redacted>
* Apply suggestion from JohannesGaessler

Co-authored-by: Johannes Gäßler <redacted>
* optimize

* use a concise expression to further speedup the cuda kernel

---------

Co-authored-by: Johannes Gäßler <redacted>
ggml/src/ggml-cuda/common.cuh
ggml/src/ggml-cuda/pad_reflect_1d.cu
tests/test-backend-ops.cpp