git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	nullname <redacted>
	Sat, 24 Jan 2026 06:02:07 +0000 (14:02 +0800)
committer	GitHub <redacted>
	Sat, 24 Jan 2026 06:02:07 +0000 (22:02 -0800)
commit	8af1f5f430baaab1719db8f0a259bcc2a1cfdaa0
tree	7e2703bf7c44f23eb6bd3a7844a1252a149dd74a	tree
parent	557515be1e93ed8939dd8a7c7d08765fdbe8be31	commit \| diff

ggml-hexagon: flash-attn opt (#19025)

* optimize flash attention kernel by improving score computation and online softmax update

* wip

* Refactor online softmax update in flash attention kernel for improved performance

* Optimize flash attention kernel by replacing float array with HVX_Vector for score computation

* wip

ggml/src/ggml-hexagon/htp/flash-attn-ops.c

diff | blob | history

Packaging of ggml-org/llama.cpp

RSS Atom