ggml-cuda: Adding support for unified memory (#8035)

author matteo <redacted>

Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)

committer GitHub <redacted>

Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)
author matteo <redacted>
Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)
committer GitHub <redacted>
Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)
diff --git a/docs/build.md b/docs/build.md

index cfe42ebbf3197e8004249e66cb9a02a9ea239e72..8b16d1a35851835d5f582fb68f63c9c491bb382e 100644 (file)
--- a/docs/build.md
+++ b/docs/build.md
@@ -178,7 +178,11 @@ For Jetson user, if you have Jetson Orin, you can try this: [Offical Support](ht
    cmake --build build --config Release
    ```
  
-The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used. The following compilation options are also available to tweak performance:
+The environment variable [`CUDA_VISIBLE_DEVICES`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars) can be used to specify which GPU(s) will be used.
+
+The environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY=1` can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted. In Windows this setting is available in the NVIDIA control panel as `System Memory Fallback`.
+
+The following compilation options are also available to tweak performance:
  
  | Option                        | Legal values           | Default | Description                                                                                                                                                                                                                                                                             |
  |-------------------------------|------------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
diff --git a/ggml/src/ggml-cuda.cu b/ggml/src/ggml-cuda.cu

index b510777fb78f6fe7d582327fcbeeb047fe8cc04d..68605fff6dbb82663743f390dff22c97df35ba6b 100644 (file)
--- a/ggml/src/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda.cu
@@ -130,7 +130,22 @@ static cudaError_t ggml_cuda_device_malloc(void ** ptr, size_t size, int device)
      }
      return res;
  #else
+
+#if !defined(GGML_USE_HIPBLAS) && !defined(GGML_USE_MUSA)
+    cudaError_t err;
+    if (getenv("GGML_CUDA_ENABLE_UNIFIED_MEMORY") != nullptr)
+    {
+        err = cudaMallocManaged(ptr, size);
+    }
+    else
+    {
+        err = cudaMalloc(ptr, size);
+    }
+    return err;
+#else
      return cudaMalloc(ptr, size);
+#endif // !defined(GGML_USE_HIPBLAS) && !defined(GGML_USE_MUSA)
+
  #endif
  }
author	matteo <redacted>
	Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)
committer	GitHub <redacted>
	Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)
docs/build.md		patch \| blob \| history
ggml/src/ggml-cuda.cu		patch \| blob \| history