git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	0cc4m <redacted>
	Wed, 7 Feb 2024 06:54:50 +0000 (07:54 +0100)
committer	GitHub <redacted>
	Wed, 7 Feb 2024 06:54:50 +0000 (07:54 +0100)
commit	ee1628bdfea8b0079fed0140ac2f00ef1b465b57
tree	42ee597afa79a6c4e0bb772d78a7cfcd54777696	tree
parent	ed0bf32290ee5b30ffad5becd99cbecef74aedd7	commit \| diff

Basic Vulkan Multi-GPU implementation (#5321)

* Initial Vulkan multi-gpu implementation

Move most global variables into backend context

* Add names to backend device functions

* Add further missing cleanup code

* Reduce code duplication in tensor split layer assignment

* generalize LLAMA_SPLIT_LAYER for all backends, do not expose device count and memory in llama.h

* Only do device info print in the beginning and initialize one backend for cpu assist

Add missing cleanup code

* Rework backend memory management to make sure devices and buffers get properly allocated and freed

* Rename cpu assist free function

---------

Co-authored-by: slaren <redacted>

common/common.cpp		diff \| blob \| history
ggml-vulkan.cpp		diff \| blob \| history
ggml-vulkan.h		diff \| blob \| history
ggml.c		diff \| blob \| history
llama.cpp		diff \| blob \| history