From: Justin Bradford Date: Tue, 17 Mar 2026 12:03:54 +0000 (-0700) Subject: kleidiai : fix MUL_MAT support for batched (3D) inputs (llama/20620) X-Git-Tag: v0.9.9~54 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=e95f3be49af9e3a03f72f02178c5f0973bb3fb1c;p=pkg%2Fggml%2Fsources%2Fggml kleidiai : fix MUL_MAT support for batched (3D) inputs (llama/20620) * kleidiai : fix MUL_MAT support for batched (3D) inputs The supports_op() check incorrectly rejected MUL_MAT operations with 3D inputs (ne[2] > 1), but the actual compute_forward_qx() implementation handles batched inputs correctly via a loop over ne12. This caused models with Q4_0/Q8_0 weights to crash during graph scheduling when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during loading (tested with 2D inputs) but the runtime used 3D inputs. Also relax the buffer check to allow supports_op() to be called during weight loading when src[0]->buffer is NULL. Fixes #20608 * Kleidiai support_ops should only return true for 3D inputs, not also 4D --- diff --git a/src/ggml-cpu/kleidiai/kleidiai.cpp b/src/ggml-cpu/kleidiai/kleidiai.cpp index 7a592494..0ecf7ae0 100644 --- a/src/ggml-cpu/kleidiai/kleidiai.cpp +++ b/src/ggml-cpu/kleidiai/kleidiai.cpp @@ -1461,7 +1461,7 @@ class extra_buffer_type : ggml::cpu::extra_buffer_type { return false; } if ((op->src[1]->type == GGML_TYPE_F32 || op->src[1]->type == GGML_TYPE_I32) && - ggml_ne(op->src[1], 2) == 1 && ggml_ne(op->src[1], 3) == 1) { + ggml_ne(op->src[1], 3) == 1) { return true; } }