From: Aadeshveer Singh <redacted>
Date: Sat, 20 Dec 2025 11:28:57 +0000 (+0530)
Subject: Added comments explaining thread block size selection logic based on row count and... 
X-Git-Tag: v0.9.5~53
X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=ef780f78ee8b8f179583e176b4e69bc9ae5f1d87;p=pkg%2Fggml%2Fsources%2Fggml

Added comments explaining thread block size selection logic based on row count and column size, derived from historical commit context (llama/18212)
---

diff --git a/src/ggml-cuda/mean.cu b/src/ggml-cuda/mean.cu
index 347abc18..691d8dcb 100644
--- a/src/ggml-cuda/mean.cu
+++ b/src/ggml-cuda/mean.cu
@@ -63,6 +63,9 @@ void ggml_cuda_op_mean(ggml_backend_cuda_context & ctx, ggml_tensor * dst) {
 
     const int id  = ggml_cuda_get_device();
     const int nsm = ggml_cuda_info().devices[id].nsm;
+
+    // Heuristic for block size selection to optimize occupancy.
+    // See discussion in: https://github.com/ggml-org/llama.cpp/pull/15132
     if ((nrows / nsm) < 2) {
         const dim3 block_dims(512, 1, 1);
         reduce_rows_f32</*norm=*/true><<<block_nums, block_dims, 0, stream>>>(src0_d, dst_d, ncols);