git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Markus Tavenrath <redacted>
	Mon, 17 Jun 2024 14:10:15 +0000 (16:10 +0200)
committer	GitHub <redacted>
	Mon, 17 Jun 2024 14:10:15 +0000 (16:10 +0200)
commit	6a2f0b3474d479bda4ac2ee7cfd5dcdcf0be1f79
tree	093504f65b9e2ff2b1f359e9c9980eb0a17159c2	tree
parent	21be9cab94e0b5b53cb6edeeebf8c8c799baad03	commit \| diff

Implement non-mapped async IO for CUDA on Windows. (#7896)

* Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive.

* Free resources except for backend.

* Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA.

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* Fix editorconfig and unused variable

* Fix issues with Windows build

---------

Co-authored-by: slaren <redacted>