]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain...
authorMoonShadow <redacted>
Sun, 15 Mar 2026 16:23:58 +0000 (00:23 +0800)
committerGeorgi Gerganov <redacted>
Sun, 15 Mar 2026 19:50:13 +0000 (21:50 +0200)
commit9e6053bc9f8c243eff6fcafe1c707a4d869f79e3
tree2a1bcf43bc592475927eb2a3fc478c765f9cf3fe
parentf3d6372f47119b560ea288fba0ce1b8db46ad8c6
ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain (llama/20536)

* ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain

On AMD APU/iGPU devices (unified memory architecture), hipMemAdviseSetCoarseGrain
returns hipErrorInvalidValue because the hint is not applicable to UMA systems.
The previous CUDA_CHECK() call treated this as a fatal error, causing crashes on
APU systems such as AMD Strix Halo (gfx1151).

Fix: treat hipMemAdviseSetCoarseGrain as an optional performance hint - call it
without error checking and clear any resulting error with hipGetLastError().

Also add pre-allocation debug logging (GGML_LOG_DEBUG) to help diagnose memory
issues on APU systems, and store totalGlobalMem in device info.

Context: AMD APUs on Windows are affected by a ROCm runtime bug that limits
hipMallocManaged to ~64GB regardless of available system RAM. A fix has been
submitted upstream: https://github.com/ROCm/rocm-systems/pull/4077

Co-Authored-By: Claude Sonnet 4.6 <redacted>
* ggml/hip: remove unrelated changes, keep only hipMemAdviseSetCoarseGrain fix

---------

Co-authored-by: moonshadow-25 <redacted>
Co-authored-by: Claude Sonnet 4.6 <redacted>
src/ggml-cuda/ggml-cuda.cu