]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CANN: refactor mask handling and improve performance in FA (#15561)
authorChenguang Li <redacted>
Wed, 27 Aug 2025 09:21:41 +0000 (17:21 +0800)
committerGitHub <redacted>
Wed, 27 Aug 2025 09:21:41 +0000 (17:21 +0800)
commit1e7489745a74996fc36e8fd05b73aa16bc184e0c
tree4d0eb53eca27d324b4ac6d0a3f94e32eaeb15e50
parent1cf123a343ab7ca5586aacb9e0a1d2de7fe33be4
CANN: refactor mask handling and improve performance in FA (#15561)

* CANN(flash-attn): refactor mask handling and improve performance

1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode.
2. Optimized performance in non-alibi scenarios by reducing one repeat operation.
3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16.

Signed-off-by: noemotiovon <redacted>
* [CANN]: fix review

Signed-off-by: noemotiovon <redacted>
* [CANN]: Optimization FA BNSD to BSND

Signed-off-by: noemotiovon <redacted>
---------

Signed-off-by: noemotiovon <redacted>
ggml/src/ggml-cann/aclnn_ops.cpp
ggml/src/ggml-cann/ggml-cann.cpp