]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (llama/20250)
authorPaul Flynn <redacted>
Mon, 9 Mar 2026 14:48:12 +0000 (10:48 -0400)
committerGeorgi Gerganov <redacted>
Sun, 15 Mar 2026 19:50:13 +0000 (21:50 +0200)
commita18fa2a2dc887ff1520228e5c8e7b8b87f605c4a
tree20a8aa00c950eba2ee0d08b92ca18afefa34efa0
parent624517a71b8465227f0bd54ae99e49608cbb37c0
metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (llama/20250)

Enable mul_mv_ext small-batch kernels (BS 2-8) for BF16, Q2_K,
and Q3_K quantization types. These types previously fell through
to the slower single-row mul_mv path.

BF16 uses the float4 dequantize path (like F16). Q2_K and Q3_K
use the float4x4 K-quant path (like Q4_K/Q5_K/Q6_K).

Co-authored-by: Claude Opus 4.6 <redacted>
src/ggml-metal/ggml-metal-ops.cpp
src/ggml-metal/ggml-metal.metal