]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
hexagon: further optimizations and refactoring for flash attention (llama/19583)
authorMax Krasnyansky <redacted>
Sat, 14 Feb 2026 00:27:30 +0000 (16:27 -0800)
committerGeorgi Gerganov <redacted>
Sat, 14 Feb 2026 22:20:18 +0000 (00:20 +0200)
commit548ec702d19f292d4141330f14f8f3cba4a39d31
treefdd59a3d7e1c43242bcb9719d5511ac4e2d204dc
parent66113986664c914ca436172b8a51a0f50a14a1fb
hexagon: further optimizations and refactoring for flash attention (llama/19583)

* ggml-hexagon: fa improvements

ggml-hexagon: optimize flash attention calculations with improved variable handling

ggml-hexagon: streamline flash attention operations by removing redundant checks for FP32

ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx2 by simplifying variable handling for unused elements

ggml-hexagon: optimize flash attention by changing slope vector type to F16

* hexfa: fixed test-backend-ops failurs due to leftover element handling

* hexagon: refactor and optimize fa to use local context struct

* ggml-hexagon: optimize flash-attention using hvx_vec_expf

Use HVX for online softmax.

---------

Co-authored-by: chraac <redacted>
src/ggml-hexagon/htp/flash-attn-ops.c