]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
opencl: add optimized q8_0 mm kernel for adreno (#18871)
authorshaofeiqi <redacted>
Fri, 30 Jan 2026 18:19:27 +0000 (10:19 -0800)
committerGitHub <redacted>
Fri, 30 Jan 2026 18:19:27 +0000 (10:19 -0800)
commit971facc38e2544fcf2cc09368de5d1a68e33c10f
tree3046d6066cfb6bd32b989dd73690df11ff4c28bb
parentd9a2a4bcaa071d730bb1ab4fb411a9c93b50dd13
opencl: add optimized q8_0 mm kernel for adreno (#18871)

* Add Q8_0 OpenCL kernel

Co-authored-by: yunjie <redacted>
* opencl: fix build for non-adreno

* opencl: refactor q8_0

* opencl: enforce subgroup size of 64 for adreno for q8_0

* For A750 and older generations, subgroup size can be 64 or 128.
  This kernel assumes subgroup size 64.

* opencl: suppress warning when adreno kernels are disabled

---------

Co-authored-by: yunjie <redacted>
Co-authored-by: Li He <redacted>
ggml/src/ggml-opencl/CMakeLists.txt
ggml/src/ggml-opencl/ggml-opencl.cpp
ggml/src/ggml-opencl/kernels/cvt.cl
ggml/src/ggml-opencl/kernels/gemv_noshuffle_general_q8_0_f32.cl [new file with mode: 0644]
ggml/src/ggml-opencl/kernels/mul_mm_q8_0_f32_8x4.cl [new file with mode: 0644]