]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: add expert reduce kernel (llama/16857)
authorAman Gupta <redacted>
Fri, 31 Oct 2025 12:05:07 +0000 (20:05 +0800)
committerGeorgi Gerganov <redacted>
Sat, 1 Nov 2025 07:41:35 +0000 (09:41 +0200)
commit82b8e2697840fb3b483bcd47da22efaae79c226c
treebf7e0194367c9ccebe409f394fab835a595c5ca5
parenta4ec1c544de634cc6457cb77bcfabaff068a16d8
CUDA: add expert reduce kernel (llama/16857)

* CUDA: add expert reduce kernel

* contigous checks, better formatting, use std::vector instead of array

* use vector empty instead of size

Co-authored-by: Johannes Gäßler <redacted>
---------

Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/ggml-cuda.cu
src/ggml-cuda/moe-expert-reduce.cu [new file with mode: 0644]
src/ggml-cuda/moe-expert-reduce.cuh [new file with mode: 0644]
tests/test-backend-ops.cpp