]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: remove unnecessary warp reduce in FA (ggml/1032)
authormahorozte <redacted>
Tue, 3 Dec 2024 13:11:43 +0000 (21:11 +0800)
committerGeorgi Gerganov <redacted>
Tue, 3 Dec 2024 18:04:49 +0000 (20:04 +0200)
commite9e661bd59364e5d4fce035834b6cadcadf8c2ef
tree42936b734629e064b2e915cf92b7e1e95a9c4f54
parentefb6ae963031709fc331e6e48cc4606ac8f9c3a7
CUDA: remove unnecessary warp reduce in FA (ggml/1032)

* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit

* same problem in vec32

---------

Co-authored-by: ZhaoXiaoYu <redacted>
ggml/src/ggml-cuda/fattn-vec-f16.cuh
ggml/src/ggml-cuda/fattn-vec-f32.cuh