git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Alan Gray <redacted>
	Thu, 3 Apr 2025 01:31:15 +0000 (02:31 +0100)
committer	GitHub <redacted>
	Thu, 3 Apr 2025 01:31:15 +0000 (03:31 +0200)
commit	3f9da22c2b21a2cef216de50006436ef1cab8764
tree	f63d5229d3e0aef8416c61e21624524daca99985	tree
parent	2a0dc97e56eac6db0a4016f0b45da6d0a0055ef2	commit \| diff

Simplify and improve CUDA graphs through use of indirect copy pointers (#9017)

* CUDA: Simplify and improve CUDA graphs through use of indirect copy pointers

Previously there was complexity in the CUDA graphs implementation due
frequently changing parameters to copy kernels associated with K and V
cache pointers. This patch simplifies by using indirection to avoid
such parameters frequently changing, avoiding the need for frequent
graph updates.

Fixes #12152

* Addressed comments

* fix HIP builds

* properly sync to stream

* removed ggml_cuda_cpy_fn_ptrs

* move stream sync before free

* guard to only use indirection with graphs

* style fixes

* check for errors

---------

Co-authored-by: slaren <redacted>

ggml/src/ggml-cuda/common.cuh		diff \| blob \| history
ggml/src/ggml-cuda/cpy.cu		diff \| blob \| history
ggml/src/ggml-cuda/cpy.cuh		diff \| blob \| history
ggml/src/ggml-cuda/ggml-cuda.cu		diff \| blob \| history