]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
HIP: add fattn-mma-f16 for RDNA4 (llama/18481)
authoryulo <redacted>
Tue, 13 Jan 2026 12:52:16 +0000 (20:52 +0800)
committerGeorgi Gerganov <redacted>
Fri, 30 Jan 2026 11:49:29 +0000 (13:49 +0200)
commit7e52a14dafebcaec994fe2e7a545fe16ad2e2363
tree222827de32b8806ffd8da4aa318af299199e430e
parentb6d1f0f247adcfa25c0ca1ffe97e651fe1afd5e2
HIP: add fattn-mma-f16 for RDNA4 (llama/18481)

* finish VQ mma

* flash_attn_ext_f16_iter

* KQ_rowsum

* correct exp

* fix scale error

* fix softmax scale

* fix softmax scale

* enable fattn on cpu side

* fix random error

* disable fattn-mma-f16 on rdna3

* fix wrong col for rdna

* use identity mat to transpose

* resolve conflicts

* basic tuning for DeepSeek-R1-Distill-Qwen-1.5B

* fix volta compile error

* align rdna4 policy for fattn

* adjust fattn policy

* adjust kernel selection logic

* update as the review comments

* keep fattn-wmma logic

* adjust kernel selection logic

---------

Co-authored-by: zhang hui <redacted>
Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/common.cuh
src/ggml-cuda/fattn-common.cuh
src/ggml-cuda/fattn-mma-f16.cuh
src/ggml-cuda/fattn.cu
src/ggml-cuda/mma.cuh
src/ggml-cuda/vendors/hip.h