]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
cuda : optimize argmax (#10441)
authorDiego Devesa <redacted>
Thu, 21 Nov 2024 17:18:50 +0000 (18:18 +0100)
committerGitHub <redacted>
Thu, 21 Nov 2024 17:18:50 +0000 (18:18 +0100)
commita5e47592b6171ae21f3eaa1aba6fb2b707875063
treee31c54c85493203c8b8a9ccd7bd41ac0aa8c3dda
parent1bb30bf28cb5a7adf111bc41c935bdaf128397e7
cuda : optimize argmax (#10441)

* cuda : optimize argmax

* remove unused parameter

ggml-ci

* fixup : use full warps

ggml-ci

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* fix ub

* ggml : check ne00 <= INT32_MAX in argmax and argsort

---------

Co-authored-by: Johannes Gäßler <redacted>
ggml/src/ggml-cuda/argmax.cu
ggml/src/ggml-cuda/common.cuh
ggml/src/ggml-cuda/quantize.cu
ggml/src/ggml.c
tests/test-backend-ops.cpp