From: ymcki Date: Thu, 5 Mar 2026 15:01:23 +0000 (+0800) Subject: models : kda chunk size = 16 (#19827) X-Git-Tag: upstream/0.0.8611~400 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=a0ed91a442ea6b013bd42ebc3887a81792eaefa1;p=pkg%2Fggml%2Fsources%2Fllama.cpp models : kda chunk size = 16 (#19827) * models : add llm_build_delta_net_base * cont : keep qwen35 and qwen35moe graphs intact * cont : add comments [no ci] * add kimi linear to delta-net-base * removed unnecessary ggml_cont from g_exp_t * removed ggml_cont from g_diff_exp_t. moved ggml_cont for o to kimi-linear.cpp * removed unnecessary diag mask * cont : simplify * cont : avoid graph splits * scale q after mul instead of beginning * scale q after mul instead of beginning * identical ppl * cont : fix scale and decay mask * minor : remove TODO * block implementation for kda * remove space at the end of line 101 * concat+pad * pad+binary row concat * chunk size 16 for kda * removed minor differences to master --------- Co-authored-by: Georgi Gerganov --- diff --git a/src/models/delta-net-base.cpp b/src/models/delta-net-base.cpp index 99f1fdd95..c57abbb5b 100644 --- a/src/models/delta-net-base.cpp +++ b/src/models/delta-net-base.cpp @@ -1,7 +1,5 @@ #include "models.h" -#define CHUNK_SIZE 64 - // utility to get one slice from the third dimension // input dim: [x, y, c, b] // output dim: [x, y, 1, b] @@ -57,7 +55,7 @@ std::pair llm_build_delta_net_base::build_delta_ne g = ggml_permute(ctx0, g, 0, 2, 1, 3); // [g_0, n_tokens, H_v, n_seqs] b = ggml_permute(ctx0, b, 0, 2, 1, 3); // [ 1, n_tokens, H_v, n_seqs] - const int CS = CHUNK_SIZE; + const int CS = kda ? 16 : 64; // chunk size const int pad = (CS - n_tokens % CS) % CS; const int n_chunks = (n_tokens + pad) / CS;