]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed...
authorAlfred <redacted>
Fri, 19 Dec 2025 17:42:28 +0000 (12:42 -0500)
committerGeorgi Gerganov <redacted>
Wed, 31 Dec 2025 15:52:09 +0000 (17:52 +0200)
commit17a4cb15b881fdabfbcb67b1ab1570d1b16de57d
treef046ffa1c641ac31e0289965985b63d0f70cd342
parent195d8d0c65d40d96caba980706a3011675745892
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (llama/17977)

* feat: implement real Q8_0

* feat: adding cmake option for configuring FP32 quantize group size

* typo: set() shall be used

---------

Co-authored-by: ngdxzy <redacted>
ggml/CMakeLists.txt
ggml/src/ggml-hexagon/CMakeLists.txt
ggml/src/ggml-hexagon/htp/CMakeLists.txt
ggml/src/ggml-hexagon/htp/matmul-ops.c