From: Georgi Gerganov Date: Tue, 13 May 2025 15:04:00 +0000 (+0300) Subject: metal : optimize multi-sequence FA vec kernel (#13493) X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=c252e0c4097b34666e5a81db9d0450d71fa3098f;p=pkg%2Fggml%2Fsources%2Fllama.cpp metal : optimize multi-sequence FA vec kernel (#13493) * batched-bench : fix pp batch contents * metal : optimize multi-sequence FA vec kernel ggml-ci --- diff --git a/ggml/src/ggml-metal/ggml-metal.metal b/ggml/src/ggml-metal/ggml-metal.metal index 9cfddf45..122ae597 100644 --- a/ggml/src/ggml-metal/ggml-metal.metal +++ b/ggml/src/ggml-metal/ggml-metal.metal @@ -3887,6 +3887,11 @@ kernel void kernel_flash_attn_ext_vec( sm[tiisg] = pm[ic + tiisg]; } + // skip -INF blocks + if (simd_max(sm[tiisg]) == -INFINITY) { + continue; + } + // Q*K^T { // each simdgroup processes 1 query and NE (NW/NL) head elements