]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)
authorAman Gupta <redacted>
Sat, 24 Jan 2026 06:25:20 +0000 (14:25 +0800)
committerGitHub <redacted>
Sat, 24 Jan 2026 06:25:20 +0000 (14:25 +0800)
commit81ab64f3c858c0db8c7c3a6bccd4cbbe624f52a3
treec29786be85199c33de26e0e7a20c42757ffcb78c
parent8af1f5f430baaab1719db8f0a259bcc2a1cfdaa0
ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)

* ggml-cuda: add split-wise cuda graph

* add n-cpu-moe compare_llama_bench.py

* fix hip/musa builds
ggml/src/ggml-cuda/common.cuh
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/mean.cu
scripts/compare-llama-bench.py