]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
Custom RoPE + bettter memory management for CUDA (#2295)
authorKawrakow <redacted>
Fri, 21 Jul 2023 14:27:51 +0000 (17:27 +0300)
committerGitHub <redacted>
Fri, 21 Jul 2023 14:27:51 +0000 (17:27 +0300)
commitd924522a46c5ef097af4a88087d91673e8e87e4d
treea78782f11a57de0633bed5e505666bef50a80901
parent4d76a5f49b9b5382dba5d13d92edb9159536c225
Custom RoPE + bettter memory management for CUDA (#2295)

* Custom RoPE + bettter memory management for CUDA

* Adjusted look ahead in ggml_cuda_pool_malloc to 5%

This is sufficient it seems.
We end up using about 200 MB less VRAM that way when running
the 13B model with context 8192.

---------

Co-authored-by: Iwan Kawrakow <redacted>
ggml-cuda.cu