]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: remove unnecessary warp reduce in FA (#1032)
authormahorozte <redacted>
Tue, 3 Dec 2024 13:11:43 +0000 (21:11 +0800)
committerGitHub <redacted>
Tue, 3 Dec 2024 13:11:43 +0000 (14:11 +0100)
commitb903ffe79daf18c0aaacbebe44a7b93a6b8d0982
tree673cc9be84890467a7100a832796f51bf54dff24
parent589fed13a77d7e54435c2182384955706b60b841
CUDA: remove unnecessary warp reduce in FA (#1032)

* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit

* same problem in vec32

---------

Co-authored-by: ZhaoXiaoYu <redacted>
src/ggml-cuda/fattn-vec-f16.cuh
src/ggml-cuda/fattn-vec-f32.cuh