]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
cuda : optimize argmax (llama/10441)
authorDiego Devesa <redacted>
Thu, 21 Nov 2024 17:18:50 +0000 (18:18 +0100)
committerGeorgi Gerganov <redacted>
Tue, 3 Dec 2024 19:05:37 +0000 (21:05 +0200)
commit0719a26b027126f16778f934c1d5005fcfcaeb26
tree54797b7bd6998afd14c9b0d21606ad59a4a97637
parent72d3cd7ae75494183466e623002b6a782b163498
cuda : optimize argmax (llama/10441)

* cuda : optimize argmax

* remove unused parameter

ggml-ci

* fixup : use full warps

ggml-ci

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* fix ub

* ggml : check ne00 <= INT32_MAX in argmax and argsort

---------

Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/argmax.cu
src/ggml-cuda/common.cuh
src/ggml-cuda/quantize.cu
src/ggml.c