git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)

* model: add support for extra bufs for all devices

* hexagon: add experimental ggml-hexagon backend for the Hexagon NPU

This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU.

Highlights:
- Supports Hexagon versions: v73, v75, v79, and v81
- Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5
- Supports Q4_0, Q8_0, MXFP4, and FP32 data types
- Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX

**Note:** This backend is experimental and may exhibit instability or limited performance across supported devices.
It is intended for early testing and feedback from llama.cpp/ggml developer and user community.

Co-Authored-By: Rajdeep Ganguly <redacted>
Co-Authored-By: Todor Boinovski <redacted>
* hexagon: fix format checker errors

* hexagon: update readme and cmake presets

* ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions

* hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input

* hexagon: move ADB helper scripts into scripts/snapdragon/adb

* hexagon: replace all f/printfs with GGML_LOG_...

* readme: add hexagon to the list supported backends

* hexagon: stack malmuts with quantized inputs only

* hexagon: add TODO for fixing issues in hexagon_graph_optimize

* hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC

* scripts: fix lint errors

* scripts: update qdc pytest script to make linter happy

* hexagon: add reduce sum in fp32

* hexagon: reduce number of vector stores in matmul output

* hexagon: remove the need for vdelta in reduce-multiply-x8

* hexagon: consistent use of reduce_sum_fp32 for row_sums

* hexagon: some more matmul optimizations and comments

Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models).
We've handled those cases already but at a higher overhead.

* hexagon: update cmake presets

* hexagon: add OPMASK support for run-bench.sh wrapper

* hexagon: update to use GGML_BACKEND_API

* hexagon: remove unused logic for setting tensor flags for the views

* hexagon: add asserts to set/get_tensor to make sure we handle complete tensors

Same asserts as the CPU backend.

* hexagon: use cpy_tensor slow path for non-host buffers

* hexagon: error checks in the buffer allocator

* cmake: move include(extProj) under ggml-hexagon

* hexagon: don't forget to delete the backend on free

* hexagon: set/get_tensor size assert apply only to quantized tensors

* hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now

GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way.
Ideally we need a bit more finer log levels.

* docs: typos in hexagon developer docs (libggm-...)

* hexagon: overhaul error handling in the session/device allocation

this should handle all failure paths in the session allocation.

* hexagon: update cmake presets to enable fp16 vectors

* hexagon: remove unused time_usec function

* hexagon: don't forget to release buffer contexts

* hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure)

* hexagon: remove custom can_repeat function and use ggml_can_repeat

---------

Co-authored-by: Rajdeep Ganguly <redacted>
Co-authored-by: Todor Boinovski <redacted>

author	Max Krasnyansky <redacted>
	Wed, 22 Oct 2025 20:47:09 +0000 (13:47 -0700)
committer	GitHub <redacted>
	Wed, 22 Oct 2025 20:47:09 +0000 (13:47 -0700)
commit	63d2fc46e17a06be5b4b5823a5ada088317f1f0a
tree	5d49db0a298e27750ec295ff18d0ff1f54992f95	tree
parent	a2e0088d9242bd9e57f8b852b05a6e47843b5a45	commit \| diff

.github/workflows/build.yml		diff \| blob \| history
CODEOWNERS		diff \| blob \| history
README.md		diff \| blob \| history
docs/backend/hexagon/CMakeUserPresets.json	[new file with mode: 0644]	blob
docs/backend/hexagon/README.md	[new file with mode: 0644]	blob
docs/backend/hexagon/developer.md	[new file with mode: 0644]	blob
ggml/CMakeLists.txt		diff \| blob \| history
ggml/include/ggml-hexagon.h	[new file with mode: 0644]	blob
ggml/src/CMakeLists.txt		diff \| blob \| history
ggml/src/ggml-backend-reg.cpp		diff \| blob \| history
ggml/src/ggml-hexagon/CMakeLists.txt	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/ggml-hexagon.cpp	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp-utils.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp-utils.h	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/CMakeLists.txt	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/act-ops.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/binary-ops.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/cmake-toolchain.cmake	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/htp-ctx.h	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/htp-dma.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/htp-dma.h	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/htp-msg.h	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/htp-ops.h	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/htp_iface.idl	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/hvx-exp.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/hvx-inverse.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/hvx-sigmoid.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/hvx-utils.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/hvx-utils.h	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/main.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/matmul-ops.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/ops-utils.h	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/rope-ops.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/softmax-ops.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/unary-ops.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/worker-pool.c	[new file with mode: 0644]	blob
ggml/src/ggml-hexagon/htp/worker-pool.h	[new file with mode: 0644]	blob
scripts/snapdragon/adb/llama-cli.farf	[new file with mode: 0644]	blob
scripts/snapdragon/adb/run-bench.sh	[new file with mode: 0755]	blob
scripts/snapdragon/adb/run-cli.sh	[new file with mode: 0755]	blob
scripts/snapdragon/adb/run-tool.sh	[new file with mode: 0755]	blob
scripts/snapdragon/qdc/readme.md	[new file with mode: 0644]	blob
scripts/snapdragon/qdc/requirements.txt	[new file with mode: 0644]	blob
scripts/snapdragon/qdc/tests/test_bench.py	[new file with mode: 0644]	blob
src/llama-model.cpp		diff \| blob \| history