]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
cuda : improve cuda pool efficiency using virtual memory (#4606)
authorslaren <redacted>
Sun, 24 Dec 2023 13:34:22 +0000 (14:34 +0100)
committerGitHub <redacted>
Sun, 24 Dec 2023 13:34:22 +0000 (14:34 +0100)
commit5bf3953d7e9831ea22b0bc017ce97409b801ccf1
tree48c0136d9943fb9cca22209894464970549c24b5
parent708e179e8562c2604240df95a2241dea17fd808b
cuda : improve cuda pool efficiency using virtual memory (#4606)

* cuda : improve cuda pool efficiency using virtual memory

* fix mixtral

* fix cmake build

* check for vmm support, disable for hip

ggml-ci

* fix hip build

* clarify granularity

* move all caps to g_device_caps

* refactor error checking

* add cuda_pool_alloc, refactor most pool allocations

ggml-ci

* fix hip build

* CUBLAS_TF32_TENSOR_OP_MATH is not a macro

* more hip crap

* llama : fix msvc warnings

* ggml : fix msvc warnings

* minor

* minor

* cuda : fallback to CPU on host buffer alloc fail

* Update ggml-cuda.cu

Co-authored-by: Johannes Gäßler <redacted>
* Update ggml-cuda.cu

Co-authored-by: Johannes Gäßler <redacted>
* ensure allocations are always aligned

* act_size -> actual_size

---------

Co-authored-by: Johannes Gäßler <redacted>
CMakeLists.txt
Makefile
ggml-backend.c
ggml-cuda.cu
ggml.c
ggml.h
llama.cpp
tests/test-grad0.cpp