From: snadampal Date: Fri, 26 Jan 2024 17:17:59 +0000 (-0600) Subject: ggml : update softmax n_task calculation (llama/5126) X-Git-Tag: upstream/1.7.4~1076 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=3c8d14e9c578d5531503939d48b424f6cc3a362e;p=pkg%2Fggml%2Fsources%2Fwhisper.cpp ggml : update softmax n_task calculation (llama/5126) updated the n_task calculation to use max number of threads possible. This has improved the prompt eval performance by around 5% for DOT kernels and by around 10% for MMLA kernels on AWS Graviton3. --- diff --git a/ggml.c b/ggml.c index 6a1e2187..cb7b7474 100644 --- a/ggml.c +++ b/ggml.c @@ -16602,7 +16602,7 @@ static int ggml_get_n_tasks(struct ggml_tensor * node, int n_threads) { } break; case GGML_OP_SOFT_MAX: { - n_tasks = MIN(MIN(4, n_threads), ggml_nrows(node->src[0])); + n_tasks = MIN(n_threads, ggml_nrows(node->src[0])); } break; case GGML_OP_CONV_TRANSPOSE_1D: {