]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed...
authorAlfred <redacted>
Fri, 19 Dec 2025 17:42:28 +0000 (12:42 -0500)
committerGitHub <redacted>
Fri, 19 Dec 2025 17:42:28 +0000 (09:42 -0800)
commitce734a8a2f9fb6eb4f0383ab1370a1b0014ab787
treeae0ec7dee029943ef6cfb3257e1bf40f08d7b1d7
parent14931a826e4f5b4536f03f65c8d568d99bf64f0e
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977)

* feat: implement real Q8_0

* feat: adding cmake option for configuring FP32 quantize group size

* typo: set() shall be used

---------

Co-authored-by: ngdxzy <redacted>
docs/backend/hexagon/CMakeUserPresets.json
ggml/CMakeLists.txt
ggml/src/ggml-hexagon/CMakeLists.txt
ggml/src/ggml-hexagon/htp/CMakeLists.txt
ggml/src/ggml-hexagon/htp/matmul-ops.c