]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
backend : offload large batches to GPU (#6083)
authorslaren <redacted>
Mon, 18 Mar 2024 10:03:04 +0000 (11:03 +0100)
committerGitHub <redacted>
Mon, 18 Mar 2024 10:03:04 +0000 (11:03 +0100)
commit2bf8d0f7c4cc1235755ad06961ca761e458c5e55
treed2a462deb3c0e34cfb26eab6881a65bfb9fc3b28
parent496bc79bc2b79bfd6124b8687a8dbd6a646e9b06
backend : offload large batches to GPU (#6083)

* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <redacted>
* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <redacted>
14 files changed:
examples/imatrix/imatrix.cpp
examples/llama-bench/llama-bench.cpp
ggml-alloc.c
ggml-backend-impl.h
ggml-backend.c
ggml-backend.h
ggml-cuda.cu
ggml-cuda.h
ggml-kompute.cpp
ggml-metal.m
ggml-sycl.cpp
ggml-vulkan.cpp
ggml.c
llama.cpp