git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and MatMul updates (llama/20118)

* ggml-hexagon: enhance hvx_dot_f16_f16_aa_rx4 for improved performance by expanding vector handling and optimizing accumulation

# Conflicts:
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c

* ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx4 and enhance hvx_vec_reduce_sum_f32x4 for improved performance and reduced complexity

* ggml-hexagon: add hvx_dot_f16_f16_aa_rx32 for enhanced vector processing in flash attention

# Conflicts:
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c

* optimize hvx_dot_f16_f16_aa_rx4 and hvx_dot_f16_f16_aa_rx32 by removing unused scale parameter and improving vector accumulation

# Conflicts:
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c

* ggml-hexagon: refactor hvx_dot_f16_f16_aa_rx4 for improved readability and return HVX_Vector for better integration

# Conflicts:
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c

* ggml-hexagon: initialize sums variable in hvx_dot_f16_f16_aa_rx32 for clarity

* ggml-hexagon: fix compiling error

* fix hvx_dot_f16_f16_aa_rx4 to handle leftover elements correctly using masking

* refactor hvx_dot_f16_f16_aa_rx4 to accept vector and leftover element counts as parameters for improved clarity and flexibility

* wip

* fa: instrumentation and dma reordering

* hex-fa: use block-size 64 to improve DMA pipelining

* hex-fa: optimize vec-dot for v79 and above

* hex-fa: use block size 64

* hex-fa: avoid scalar fp32->fp16 conversions

* hex-fa: simplify dot_f16 functions using optimized vec_mpyacc

* hex-fa: rewrite mad_f32_f16 using hvx_vec_mpyacc

* hex-mm: use mpyacc in matmul dot functions

---------

Co-authored-by: chraac <redacted>

author	Max Krasnyansky <redacted>
	Thu, 5 Mar 2026 05:55:29 +0000 (21:55 -0800)
committer	Georgi Gerganov <redacted>
	Mon, 16 Mar 2026 11:10:15 +0000 (13:10 +0200)
commit	2e79b85f66b942432fb5d0a2648be5a38b711ab1
tree	627089d56ce4474fae2877ab82aaa05fd7bd085d	tree
parent	2c50962528a7424931a74941cd215147c6357ebf	commit \| diff

ggml/src/ggml-hexagon/htp/flash-attn-ops.c		diff \| blob \| history
ggml/src/ggml-hexagon/htp/hvx-base.h		diff \| blob \| history
ggml/src/ggml-hexagon/htp/hvx-copy.h		diff \| blob \| history
ggml/src/ggml-hexagon/htp/hvx-reduce.h		diff \| blob \| history
ggml/src/ggml-hexagon/htp/matmul-ops.c		diff \| blob \| history