]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
Avoid unnecessarily disabling CUDA graphs (llama/7302)
authoragray3 <redacted>
Wed, 15 May 2024 13:44:49 +0000 (14:44 +0100)
committerGeorgi Gerganov <redacted>
Tue, 28 May 2024 11:41:08 +0000 (14:41 +0300)
commited040d5b1daf9b1a3fd06529e70e6ab810bd7fbf
tree6f9f21a2dd371422e3204819be330cd2bbee8ac5
parent207773272ede4676956143475a9b80f2fbe2eafb
Avoid unnecessarily disabling CUDA graphs (llama/7302)

As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts.
This fixes the issue by avoiding the consective update counter from incrementing unnecessarily
for tokens in which cuda graphs are disabled due to batch size > 1.
src/ggml-cuda.cu