git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Kawrakow <redacted>
	Fri, 21 Jul 2023 14:27:51 +0000 (17:27 +0300)
committer	GitHub <redacted>
	Fri, 21 Jul 2023 14:27:51 +0000 (17:27 +0300)
commit	d924522a46c5ef097af4a88087d91673e8e87e4d
tree	a78782f11a57de0633bed5e505666bef50a80901	tree
parent	4d76a5f49b9b5382dba5d13d92edb9159536c225	commit \| diff

Custom RoPE + bettter memory management for CUDA (#2295)

* Custom RoPE + bettter memory management for CUDA

* Adjusted look ahead in ggml_cuda_pool_malloc to 5%

This is sufficient it seems.
We end up using about 200 MB less VRAM that way when running
the 13B model with context 8192.

---------

Co-authored-by: Iwan Kawrakow <redacted>

ggml-cuda.cu

diff | blob | history

Packaging of ggml-org/llama.cpp

RSS Atom