]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml-hexagon: flash-attention and reduce-sum optimizations (llama/19141)
authornullname <redacted>
Sat, 31 Jan 2026 05:14:20 +0000 (13:14 +0800)
committerGeorgi Gerganov <redacted>
Sat, 7 Feb 2026 08:37:38 +0000 (10:37 +0200)
commit7b5288ac073d01b6d0b81c47246147dc80265e8f
tree47bf86d735fe061628014876fce4c5d743c0af7c
parentc5b01b89eda83894f8cfa110750f66aed9806493
ggml-hexagon: flash-attention and reduce-sum optimizations (llama/19141)

* wip

* ggml-hexagon: add vectorized dot product function for FP32 and FP16 accumulation

* ggml-hexagon: optimize dot product functions for FP16 and FP32 with new vectorized implementations

* wip

* ggml-hexagon: optimize hvx_vec_dump_f32_n and hvx_vec_reduce_sum_qf32x2 functions for improved performance

* ggml-hexagon: refactor dot product functions to use a common loading function for improved readability

* optimize vector dot product functions to use unified reduction for improved performance

* wip

* ggml-hexagon: add vectorized dot product function for FP32 and FP16 accumulation

* ggml-hexagon: optimize dot product functions for FP16 and FP32 with new vectorized implementations

* wip

* ggml-hexagon: optimize hvx_vec_dump_f32_n and hvx_vec_reduce_sum_qf32x2 functions for improved performance

* ggml-hexagon: refactor dot product functions to use a common loading function for improved readability

* optimize vector dot product functions to use unified reduction for improved performance

* hexagon: optimize reduce-sum for v75+

* hexagon: always keep row_sums in sf/fp32

* ggml-hexagon: enhance directory checks for HEXAGON_SDK_ROOT and HEXAGON_TOOLS_ROOT

* fix compiling error after rebase

---------

Co-authored-by: Max Krasnyansky <redacted>
src/ggml-hexagon/CMakeLists.txt
src/ggml-hexagon/htp/flash-attn-ops.c
src/ggml-hexagon/htp/hvx-dump.h
src/ggml-hexagon/htp/hvx-reduce.h
src/ggml-hexagon/htp/matmul-ops.c
src/ggml-hexagon/htp/softmax-ops.c
src/ggml-hexagon/htp/unary-ops.c