git.djapps.eu Git - pkg/ggml/sources/ggml/commit

llama : ggml-backend integration (llama/4766)

* llama : ggml-backend integration

* ggml-backend : add names to buffers

* fix unmap after loading

* batched-bench : add tensor_split param

* llama : check for null tensor_split

* ggml-backend : increase GGML_MAX_BACKENDS

* improve graph splitting, partial fix for --no-kv-offload

* cuda : add ggml-backend split buffer support

* cuda : do not create buffer types for devices that don't exist (fixes usage without CUDA devices available)

* ggml : fix null backend dereference (llama/4807)

* ggml : fix null backend dereference

* ggml : also check ggml_backend_is_cpu

* test-backend-ops : check buffer allocation failures

* llama : add cparam (split_mode) and command line argument (--split-mode, -sm) to configure the split mode (none, layer or row)

* ggml : fix mul_mat_id work size

* llama : rewrite session kv load/set without graphs

* minor

* llama : only initialize used backends, free backends on context free

* llama : abort ctx if cuda backend init fails

* llama : rewrite lora with ggml-backend and compute on CPU

ggml-ci

* llama : only map to a backend buffer the region of the file mapping containing the tensors used in the buffer

* opencl : add ggml-backend buffer type

* cuda : only use batched_cublas with batched mat muls (fixes fp16 tg perf)

* llama : on Metal, by default offload the full model

ggml-ci

* metal : page align the data ptr (llama/4854)

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* cuda : fix split buffer free

* address review comments

* llama-bench : add split-mode parameter

* fix whitespace

* opencl : fix double initialization

* server : add --split-mode parameter

* use async copy and compute to improve multi-gpu performance

ggml-ci

* use async memcpys to copy the graph outputs to the CPU

* fix opencl

* use a host buffer for the cpu compute buffer for faster copies to the gpu

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Johannes Gäßler <redacted>

author	slaren <redacted>
	Fri, 12 Jan 2024 19:07:38 +0000 (20:07 +0100)
committer	Georgi Gerganov <redacted>
	Fri, 12 Jan 2024 19:53:48 +0000 (21:53 +0200)
commit	90509f7f6be35b69d88b928f22b2c31fccb04935
tree	23049d2b1187993002de0ce4e2c8ff549885c692	tree
parent	61268136ca5777724d511397cf97467ac952166d	commit \| diff

include/ggml/ggml-alloc.h		diff \| blob \| history
include/ggml/ggml-backend.h		diff \| blob \| history
include/ggml/ggml.h		diff \| blob \| history
src/ggml-alloc.c		diff \| blob \| history
src/ggml-backend-impl.h		diff \| blob \| history
src/ggml-backend.c		diff \| blob \| history
src/ggml-cuda.cu		diff \| blob \| history
src/ggml-cuda.h		diff \| blob \| history
src/ggml-impl.h		diff \| blob \| history
src/ggml-metal.m		diff \| blob \| history
src/ggml-opencl.cpp		diff \| blob \| history
src/ggml-opencl.h		diff \| blob \| history
src/ggml.c		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history