]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
metal: SSM kernel improvements (llama/17876)
authorGabe Goodhart <redacted>
Tue, 9 Dec 2025 19:30:02 +0000 (12:30 -0700)
committerGeorgi Gerganov <redacted>
Fri, 12 Dec 2025 15:53:23 +0000 (17:53 +0200)
commit307dc525bb44fd1570bb00e707e98ac423d8de4e
treebf70ef352fda40c816250df8ef5890fd9751b74c
parent2817582be254a2dd5c326063aeb9bb00f596aa3c
metal: SSM kernel improvements (llama/17876)

* feat: Add a batched version of ssm_conv

This was done using Claude Code. It found a number of optimizations around
how the threads were organized, resulting in a huge performance boost!

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <redacted>
* feat: Optimized SSM_SCAN kernel for metal

This used Claude Code and resulted in a modest performance improvement
while maintaining correctness.

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <redacted>
* test: Add test-backend-ops perf tests for SSM_CONV

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* test: Real representitive tests for SSM_CONV

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* refactor: Use function constant for ssm_conv batch size

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* test: backend op tests for ssm_scan from granite4 1b-h

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* style: remove commented out templates

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* feat: float4 version of ssm_conv_batched

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* fix: Add missing ggml_metal_cv_free

Signed-off-by: Gabe Goodhart <redacted>
Co-authored-by: Georgi Gerganov <redacted>
---------

Signed-off-by: Gabe Goodhart <redacted>
Co-authored-by: Georgi Gerganov <redacted>
ggml/src/ggml-metal/ggml-metal-device.cpp
ggml/src/ggml-metal/ggml-metal-device.h
ggml/src/ggml-metal/ggml-metal-impl.h
ggml/src/ggml-metal/ggml-metal-ops.cpp
ggml/src/ggml-metal/ggml-metal.metal