]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: add stream-based concurrency (llama/16991)
authorAman Gupta <redacted>
Sun, 30 Nov 2025 00:17:55 +0000 (08:17 +0800)
committerGeorgi Gerganov <redacted>
Thu, 11 Dec 2025 13:32:50 +0000 (15:32 +0200)
commit5c38d52f460e1e6715d35f73371acbd50b48bb07
tree4b50a966dcb64e61040ff1e710e934b5da334a00
parentfceb67fdb34e4616b8278fc7f9f6952241233fdb
CUDA: add stream-based concurrency (llama/16991)

* CUDA: add stream-based concurrency

* HIP: fix hipStreamWaitEvent define and nodiscard warnings

* ggml-cuda: fix fusion inside stream

* ggml-cuda: fix bug w.r.t first stream launch

* ggml-cuda: format

* ggml-cuda: improve assert message

* ggml-cuda: use lambda instead of duplicating code

* ggml-cuda: add some more comments

* ggml-cuda: add more detailed comments about concurrency

* ggml-cuda: rename + remove unused var

* ggml-cuda: fix condition for stream launch

* ggml-cuda: address review comments, add destructor

* common.cuh: add is_valid for concurrent events

* common.cuh: make comment better

* update comment

Co-authored-by: Johannes Gäßler <redacted>
* update comment

Co-authored-by: Johannes Gäßler <redacted>
* common.cuh: fix lower_bound condition + remove join_node data from write_ranges

* ggml-cuda: fix overlap condition + shadowing parameter

---------

Co-authored-by: Carl Philipp Klemm <redacted>
Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/common.cuh
src/ggml-cuda/ggml-cuda.cu
src/ggml-cuda/vendors/hip.h