]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
ggml-hexagon: flash-attn opt (#19025)
authornullname <redacted>
Sat, 24 Jan 2026 06:02:07 +0000 (14:02 +0800)
committerGitHub <redacted>
Sat, 24 Jan 2026 06:02:07 +0000 (22:02 -0800)
commit8af1f5f430baaab1719db8f0a259bcc2a1cfdaa0
tree7e2703bf7c44f23eb6bd3a7844a1252a149dd74a
parent557515be1e93ed8939dd8a7c7d08765fdbe8be31
ggml-hexagon: flash-attn opt (#19025)

* optimize flash attention kernel by improving score computation and online softmax update

* wip

* Refactor online softmax update in flash attention kernel for improved performance

* Optimize flash attention kernel by replacing float array with HVX_Vector for score computation

* wip
ggml/src/ggml-hexagon/htp/flash-attn-ops.c