]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
backend : offload large batches to GPU (llama/6083)
authorslaren <redacted>
Mon, 18 Mar 2024 10:03:04 +0000 (11:03 +0100)
committerGeorgi Gerganov <redacted>
Wed, 27 Mar 2024 11:20:00 +0000 (13:20 +0200)
commit952fb4cc11830060625f7dc23e3026030bc42f1b
treeaa95bebb8f3893393b70d583accaf4e9ba73c90b
parente1998f7365a1e7588b3c1ed93c9ce9d991f370b8
backend : offload large batches to GPU (llama/6083)

* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <redacted>
* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <redacted>
include/ggml/ggml-backend.h
src/ggml-alloc.c
src/ggml-backend-impl.h
src/ggml-backend.c
src/ggml-cuda.cu
src/ggml-cuda.h
src/ggml-kompute.cpp
src/ggml-metal.m
src/ggml-sycl.cpp
src/ggml-vulkan.cpp
src/ggml.c