musa : update doc (#9856)

author R0CKSTAR <redacted>

Sat, 12 Oct 2024 05:09:53 +0000 (13:09 +0800)

committer GitHub <redacted>

Sat, 12 Oct 2024 05:09:53 +0000 (08:09 +0300)
author R0CKSTAR <redacted>
Sat, 12 Oct 2024 05:09:53 +0000 (13:09 +0800)
committer GitHub <redacted>
Sat, 12 Oct 2024 05:09:53 +0000 (08:09 +0300)
diff --git a/README.md b/README.md

index 41e5e5448dbf5ad959551db165d88d6d2287c960..dd4927b04088797ebf034249b75697467ef8485a 100644 (file)
--- a/README.md
+++ b/README.md
@@ -31,7 +31,7 @@ variety of hardware - locally and in the cloud.
  - Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
  - AVX, AVX2 and AVX512 support for x86 architectures
  - 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
-- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP)
+- Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP and Moore Threads MTT GPUs via MUSA)
  - Vulkan and SYCL backend support
  - CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity
  
@@ -413,7 +413,7 @@ Please refer to [Build llama.cpp locally](./docs/build.md)
  | [BLAS](./docs/build.md#blas-build) | All |
  | [BLIS](./docs/backend/BLIS.md) | All |
  | [SYCL](./docs/backend/SYCL.md) | Intel and Nvidia GPU |
-| [MUSA](./docs/build.md#musa) | Moore Threads GPU |
+| [MUSA](./docs/build.md#musa) | Moore Threads MTT GPU |
  | [CUDA](./docs/build.md#cuda) | Nvidia GPU |
  | [hipBLAS](./docs/build.md#hipblas) | AMD GPU |
  | [Vulkan](./docs/build.md#vulkan) | GPU |
diff --git a/docs/build.md b/docs/build.md

index faa0ecfa49768a3bf2116d2f87c8078cc8819c15..4e362ebc78fa33dd35acfa88eb8e4bb05c796eb3 100644 (file)
--- a/docs/build.md
+++ b/docs/build.md
@@ -198,6 +198,8 @@ The following compilation options are also available to tweak performance:
  
  ### MUSA
  
+This provides GPU acceleration using the MUSA cores of your Moore Threads MTT GPU. Make sure to have the MUSA SDK installed. You can download it from here: [MUSA SDK](https://developer.mthreads.com/sdk/download/musa).
+
  - Using `make`:
    ```bash
    make GGML_MUSA=1
@@ -209,6 +211,12 @@ The following compilation options are also available to tweak performance:
    cmake --build build --config Release
    ```
  
+The environment variable [`MUSA_VISIBLE_DEVICES`](https://docs.mthreads.com/musa-sdk/musa-sdk-doc-online/programming_guide/Z%E9%99%84%E5%BD%95/) can be used to specify which GPU(s) will be used.
+
+The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted.
+
+Most of the compilation options available for CUDA should also be available for MUSA, though they haven't been thoroughly tested yet.
+
  ### hipBLAS
  
  This provides BLAS acceleration on HIP-supported AMD GPUs.
author	R0CKSTAR <redacted>
	Sat, 12 Oct 2024 05:09:53 +0000 (13:09 +0800)
committer	GitHub <redacted>
	Sat, 12 Oct 2024 05:09:53 +0000 (08:09 +0300)
README.md		patch \| blob \| history
docs/build.md		patch \| blob \| history