]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CANN: refactor mask handling and improve performance in FA (llama/15561)
authorChenguang Li <redacted>
Wed, 27 Aug 2025 09:21:41 +0000 (17:21 +0800)
committerGeorgi Gerganov <redacted>
Fri, 5 Sep 2025 09:54:06 +0000 (12:54 +0300)
commitce3d1793fcfde6e8cda0211c05607e4632938069
treebe08bc745093ffb1b6fc6e3eab097464e92c0d94
parentd3cff36a5caae6503a7aca3d97599782d17508b9
CANN: refactor mask handling and improve performance in FA (llama/15561)

* CANN(flash-attn): refactor mask handling and improve performance

1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode.
2. Optimized performance in non-alibi scenarios by reducing one repeat operation.
3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16.

Signed-off-by: noemotiovon <redacted>
* [CANN]: fix review

Signed-off-by: noemotiovon <redacted>
* [CANN]: Optimization FA BNSD to BSND

Signed-off-by: noemotiovon <redacted>
---------

Signed-off-by: noemotiovon <redacted>
src/ggml-cann/aclnn_ops.cpp
src/ggml-cann/ggml-cann.cpp