]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: add fp kernel for larger batch size MoE (#16512)
authorAman Gupta <redacted>
Tue, 14 Oct 2025 11:15:15 +0000 (19:15 +0800)
committerGitHub <redacted>
Tue, 14 Oct 2025 11:15:15 +0000 (13:15 +0200)
commit48e2fa9fb7c2de1e53808fdb65ec33f916020fc4
tree916fad74561f5316896a7d57405e65adab7a83df
parent5b6913c47b6bc71a6f927805a45387d5657d8b89
CUDA: add fp kernel for larger batch size MoE (#16512)

* CUDA: kernel for larger batch sizes for MoE

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* fixup

* tests

* Move mmq_ids_helper to mmid

* cleanup

* Remove redundant checks
ggml/src/ggml-cuda/mmf.cu
ggml/src/ggml-cuda/mmf.cuh
ggml/src/ggml-cuda/mmid.cu [new file with mode: 0644]
ggml/src/ggml-cuda/mmid.cuh [new file with mode: 0644]
ggml/src/ggml-cuda/mmq.cu
tests/test-backend-ops.cpp