]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CANN: refactor mask handling and improve performance in FA (llama/15561)
authorChenguang Li <redacted>
Wed, 27 Aug 2025 09:21:41 +0000 (17:21 +0800)
committerGeorgi Gerganov <redacted>
Sat, 20 Sep 2025 10:42:43 +0000 (13:42 +0300)
commit02e8b23137248a560f66592c6bbc00c61519fc10
tree3074a5aed36b270dd08a8c2cc4eec605d666de5f
parentece1bdfe7e3c780ead33a431f7009b53e8f0c5a1
CANN: refactor mask handling and improve performance in FA (llama/15561)

* CANN(flash-attn): refactor mask handling and improve performance

1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode.
2. Optimized performance in non-alibi scenarios by reducing one repeat operation.
3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16.

Signed-off-by: noemotiovon <redacted>
* [CANN]: fix review

Signed-off-by: noemotiovon <redacted>
* [CANN]: Optimization FA BNSD to BSND

Signed-off-by: noemotiovon <redacted>
---------

Signed-off-by: noemotiovon <redacted>
ggml/src/ggml-cann/aclnn_ops.cpp
ggml/src/ggml-cann/ggml-cann.cpp