]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: add fp kernel for larger batch size MoE (llama/16512)
authorAman Gupta <redacted>
Tue, 14 Oct 2025 11:15:15 +0000 (19:15 +0800)
committerGeorgi Gerganov <redacted>
Tue, 14 Oct 2025 19:07:44 +0000 (22:07 +0300)
commitde71a099b784f9a3761c088b3491faeb0a6321b2
tree9d7d153463ee63b29d9a34b7f834d2ee14277e24
parentf6a4d5889ed4e515e37a37a8c2de8c4e804675e6
CUDA: add fp kernel for larger batch size MoE (llama/16512)

* CUDA: kernel for larger batch sizes for MoE

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* fixup

* tests

* Move mmq_ids_helper to mmid

* cleanup

* Remove redundant checks
src/ggml-cuda/mmf.cu
src/ggml-cuda/mmf.cuh
src/ggml-cuda/mmid.cu [new file with mode: 0644]
src/ggml-cuda/mmid.cuh [new file with mode: 0644]
src/ggml-cuda/mmq.cu
tests/test-backend-ops.cpp