]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
hexagon: further optimizations and refactoring for flash attention (llama/19583)
authorMax Krasnyansky <redacted>
Sat, 14 Feb 2026 00:27:30 +0000 (16:27 -0800)
committerGeorgi Gerganov <redacted>
Sun, 15 Feb 2026 19:44:37 +0000 (21:44 +0200)
commite6476d4c12f8e921bea9be6e0f65f4e07cbe08e3
tree74c85376aa3132390e94781d0a4f6700c037e01c
parentec57bf407cb1b02998bde2b395f27eb96b0e9bc8
hexagon: further optimizations and refactoring for flash attention (llama/19583)

* ggml-hexagon: fa improvements

ggml-hexagon: optimize flash attention calculations with improved variable handling

ggml-hexagon: streamline flash attention operations by removing redundant checks for FP32

ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx2 by simplifying variable handling for unused elements

ggml-hexagon: optimize flash attention by changing slope vector type to F16

* hexfa: fixed test-backend-ops failurs due to leftover element handling

* hexagon: refactor and optimize fa to use local context struct

* ggml-hexagon: optimize flash-attention using hvx_vec_expf

Use HVX for online softmax.

---------

Co-authored-by: chraac <redacted>
ggml/src/ggml-hexagon/htp/flash-attn-ops.c