git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	Gaurav Garg <redacted>
	Tue, 27 Jan 2026 06:52:44 +0000 (06:52 +0000)
committer	Georgi Gerganov <redacted>
	Fri, 30 Jan 2026 11:49:29 +0000 (13:49 +0200)
commit	2a860ed8256d874bcb5ff1c54fad234c722706fe
tree	cbb3bd0aa6c8a54d8eb608afc2730fb4aa0c3778	tree
parent	bc83b4b82c00cfedbf86a2288fd1deac2f09a09d	commit \| diff

Reduce CPU-side stalls due to the CUDA command buffer being full (llama/19042)

* [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full

With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline.
Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size.

* Set the env variable in the CUDA backend registry allocation

* Add link to PR in code comment

* Remove warning logs and update documentation