git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	agray3 <redacted>
	Wed, 15 May 2024 13:44:49 +0000 (14:44 +0100)
committer	GitHub <redacted>
	Wed, 15 May 2024 13:44:49 +0000 (15:44 +0200)
commit	dc020985b8755dd6aa93a2f002f43c3ede808cce
tree	a4be81a8ce9f08fbafbc92c3e38ee892192bfe91	tree
parent	344f9126cc0d15891fde9472fe40b8572628ad7d	commit \| diff

Avoid unnecessarily disabling CUDA graphs (#7302)

As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts.
This fixes the issue by avoiding the consective update counter from incrementing unnecessarily
for tokens in which cuda graphs are disabled due to batch size > 1.

ggml-cuda.cu

diff | blob | history

Packaging of ggml-org/llama.cpp

RSS Atom