git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Georgi Gerganov <redacted>
	Fri, 1 Dec 2023 08:51:24 +0000 (10:51 +0200)
committer	GitHub <redacted>
	Fri, 1 Dec 2023 08:51:24 +0000 (10:51 +0200)
commit	ef47ec18da469423c276b683dd9b5741cee7023e
tree	ec3b4780dbe8f629425de499b298e8eadfd1aa4d	tree
parent	1d144112c0fbbb4ecc07dbcf4f05a380148bd6de	commit \| diff

ggml : add ggml_soft_max_ext (#4256)

* metal : implement soft_max_ext

* cuda : implement soft_max_ext

* ggml : implement soft_max_ext (CPU)

* batched-bench : print threads

ggml-ci

* metal : simplify soft_max encoding

ggml-ci

* cuda : use 512 threads for soft_max instead of 32

* ggml : update soft max cpu

* cuda : do warp-based block reduce

* cuda : increase max block size to 1024

* cuda : fix warp reduction initialization of shared mem

* metal : warp-based reduction for soft max kernel

* metal : warp-based reduce for rms_norm

* metal : simplify soft max kernel

ggml-ci

* alloc : fix build with debug

examples/batched-bench/batched-bench.cpp		diff \| blob \| history
ggml-alloc.c		diff \| blob \| history
ggml-cuda.cu		diff \| blob \| history
ggml-metal.m		diff \| blob \| history
ggml-metal.metal		diff \| blob \| history
ggml.c		diff \| blob \| history
ggml.h		diff \| blob \| history
llama.cpp		diff \| blob \| history