]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (llama/19461)
authorMario Limonciello <redacted>
Thu, 12 Feb 2026 08:38:35 +0000 (02:38 -0600)
committerGeorgi Gerganov <redacted>
Sat, 14 Feb 2026 22:20:18 +0000 (00:20 +0200)
commit64a9c1bb20b78beedf88264954f0a42fd3ce2b38
treef5c68a2b8f1abc1c09b8cd24bc85313abb8b4d53
parentab9a5484bf52fc8648501354120dc3bd34e7587c
Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (llama/19461)

There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).

The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
```

Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2].  When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.

Link: https://github.com/ROCm/rocm-libraries/issues/4398
Link: https://github.com/ggml-org/llama.cpp/issues/19269
Signed-off-by: Mario Limonciello <redacted>
src/ggml-cuda/fattn-wmma-f16.cu