git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	mahorozte <redacted>
	Tue, 3 Dec 2024 13:11:43 +0000 (21:11 +0800)
committer	Georgi Gerganov <redacted>
	Tue, 3 Dec 2024 18:04:49 +0000 (20:04 +0200)
commit	e9e661bd59364e5d4fce035834b6cadcadf8c2ef
tree	42936b734629e064b2e915cf92b7e1e95a9c4f54	tree
parent	efb6ae963031709fc331e6e48cc4606ac8f9c3a7	commit \| diff

CUDA: remove unnecessary warp reduce in FA (ggml/1032)

* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit

* same problem in vec32

---------

Co-authored-by: ZhaoXiaoYu <redacted>

ggml/src/ggml-cuda/fattn-vec-f16.cuh		diff \| blob \| history
ggml/src/ggml-cuda/fattn-vec-f32.cuh		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom