From: ymcki <redacted>
Date: Thu, 5 Mar 2026 15:01:23 +0000 (+0800)
Subject: models : kda chunk size = 16 (#19827)
X-Git-Tag: upstream/0.0.8611~400
X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=a0ed91a442ea6b013bd42ebc3887a81792eaefa1;p=pkg%2Fggml%2Fsources%2Fllama.cpp

models : kda chunk size = 16 (#19827)

* models : add llm_build_delta_net_base

* cont : keep qwen35 and qwen35moe graphs intact

* cont : add comments [no ci]

* add kimi linear to delta-net-base

* removed unnecessary ggml_cont from g_exp_t

* removed ggml_cont from g_diff_exp_t. moved ggml_cont for o to kimi-linear.cpp

* removed unnecessary diag mask

* cont : simplify

* cont : avoid graph splits

* scale q after mul instead of beginning

* scale q after mul instead of beginning

* identical ppl

* cont : fix scale and decay mask

* minor : remove TODO

* block implementation for kda

* remove space at the end of line 101

* concat+pad

* pad+binary row concat

* chunk size 16 for kda

* removed minor differences to master

---------

Co-authored-by: Georgi Gerganov <redacted>
---

diff --git a/src/models/delta-net-base.cpp b/src/models/delta-net-base.cpp
index 99f1fdd95..c57abbb5b 100644
--- a/src/models/delta-net-base.cpp
+++ b/src/models/delta-net-base.cpp
@@ -1,7 +1,5 @@
 #include "models.h"
 
-#define CHUNK_SIZE 64
-
 // utility to get one slice from the third dimension
 // input dim:  [x, y, c, b]
 // output dim: [x, y, 1, b]
@@ -57,7 +55,7 @@ std::pair<ggml_tensor *, ggml_tensor *> llm_build_delta_net_base::build_delta_ne
     g = ggml_permute(ctx0, g, 0, 2, 1, 3); // [g_0, n_tokens, H_v, n_seqs]
     b = ggml_permute(ctx0, b, 0, 2, 1, 3); // [  1, n_tokens, H_v, n_seqs]
 
-    const int CS = CHUNK_SIZE;
+    const int CS = kda ? 16 : 64; // chunk size
 
     const int pad = (CS - n_tokens % CS) % CS;
     const int n_chunks = (n_tokens + pad) / CS;