]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017)
authorAlan Gray <redacted>
Thu, 3 Apr 2025 01:31:15 +0000 (02:31 +0100)
committerGeorgi Gerganov <redacted>
Thu, 24 Apr 2025 17:39:16 +0000 (20:39 +0300)
commitd1d847f18466be0d1459adaced99e503c19c3844
tree100674fb84081834911575052b7ce80dea533cd5
parent337f91d4a6d55ce64700389a2075e4cab254229e
Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017)

* CUDA: Simplify and improve CUDA graphs through use of indirect copy pointers

Previously there was complexity in the CUDA graphs implementation due
frequently changing parameters to copy kernels associated with K and V
cache pointers. This patch simplifies by using indirection to avoid
such parameters frequently changing, avoiding the need for frequent
graph updates.

Fixes #12152

* Addressed comments

* fix HIP builds

* properly sync to stream

* removed ggml_cuda_cpy_fn_ptrs

* move stream sync before free

* guard to only use indirection with graphs

* style fixes

* check for errors

---------

Co-authored-by: slaren <redacted>
ggml/src/ggml-cuda/common.cuh
ggml/src/ggml-cuda/cpy.cu
ggml/src/ggml-cuda/cpy.cuh
ggml/src/ggml-cuda/ggml-cuda.cu