]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: add stream-based concurrency (#16991)
authorAman Gupta <redacted>
Sun, 30 Nov 2025 00:17:55 +0000 (08:17 +0800)
committerGitHub <redacted>
Sun, 30 Nov 2025 00:17:55 +0000 (08:17 +0800)
commitc7af376c298b7d09c280233548668ba6fcc17deb
tree9f92b12dbe463689a38ed40e69c2fc63fed09176
parent00425e2ed1d1ec35976710f81a9337c7b2d34d96
CUDA: add stream-based concurrency (#16991)

* CUDA: add stream-based concurrency

* HIP: fix hipStreamWaitEvent define and nodiscard warnings

* ggml-cuda: fix fusion inside stream

* ggml-cuda: fix bug w.r.t first stream launch

* ggml-cuda: format

* ggml-cuda: improve assert message

* ggml-cuda: use lambda instead of duplicating code

* ggml-cuda: add some more comments

* ggml-cuda: add more detailed comments about concurrency

* ggml-cuda: rename + remove unused var

* ggml-cuda: fix condition for stream launch

* ggml-cuda: address review comments, add destructor

* common.cuh: add is_valid for concurrent events

* common.cuh: make comment better

* update comment

Co-authored-by: Johannes Gäßler <redacted>
* update comment

Co-authored-by: Johannes Gäßler <redacted>
* common.cuh: fix lower_bound condition + remove join_node data from write_ranges

* ggml-cuda: fix overlap condition + shadowing parameter

---------

Co-authored-by: Carl Philipp Klemm <redacted>
Co-authored-by: Johannes Gäßler <redacted>
ggml/src/ggml-cuda/common.cuh
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/vendors/hip.h