git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Erik Garrison <redacted>
	Thu, 21 Dec 2023 19:45:32 +0000 (13:45 -0600)
committer	GitHub <redacted>
	Thu, 21 Dec 2023 19:45:32 +0000 (21:45 +0200)
commit	0f630fbc924aaabeea6eaf466bb4b47d13015c3e
tree	c1122dc11bd06d72fa4697dde3e1b0d45d874a29	tree
parent	562cf222b5129e40b312877e928eac3a02e4ec33	commit \| diff

cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449)

* AMD ROCm: handle UMA memory VRAM expansions

This resolves #2797 by allowing ROCm AMD GPU users with a UMA to
dynamically expand the VRAM allocated to the GPU.

Without this, AMD ROCm users with shared CPU/GPU memory usually are
stuck with the BIOS-set (or fixed) framebuffer VRAM, making it
impossible to load more than 1-2 layers.

Note that the model is duplicated in RAM because it's loaded once for
the CPU and then copied into a second set of allocations that are
managed by the HIP UMA system. We can fix this later.

* clarify build process for ROCm on linux with cmake

* avoid using deprecated ROCm hipMallocHost

* keep simplifying the change required for UMA

* cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON

CMakeLists.txt		diff \| blob \| history
README.md		diff \| blob \| history
ggml-cuda.cu		diff \| blob \| history