]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645)
authorGeorgi Gerganov <redacted>
Tue, 17 Feb 2026 10:31:49 +0000 (12:31 +0200)
committerGitHub <redacted>
Tue, 17 Feb 2026 10:31:49 +0000 (12:31 +0200)
commitad8207af7730bd6675652319263b578e24a5c0e4
tree6b70cdb93da90476b4d7579b78e2ad315339974a
parent667b694278e98a26974a50a3d809274ddd28f092
cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (#19645)

* cuda : enable CUDA graphs for MMID BS <= 4

* cont : add stream capture check

Co-authored-by: Oliver Simons <redacted>
* cont : add MMVQ_MMID_MAX_BATCH_SIZE

---------

Co-authored-by: Oliver Simons <redacted>
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/mmvq.cuh