]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
metal: SSM kernel improvements (#17876)
authorGabe Goodhart <redacted>
Tue, 9 Dec 2025 19:30:02 +0000 (12:30 -0700)
committerGitHub <redacted>
Tue, 9 Dec 2025 19:30:02 +0000 (21:30 +0200)
commit086a63e3a5d2dbbb7183a74db453459e544eb55a
tree08dbba68d3195142c09e806b2ceb04fc7602fdfe
parentb63509262ac88db248fa70a1364c8a76e339e93f
metal: SSM kernel improvements (#17876)

* feat: Add a batched version of ssm_conv

This was done using Claude Code. It found a number of optimizations around
how the threads were organized, resulting in a huge performance boost!

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <redacted>
* feat: Optimized SSM_SCAN kernel for metal

This used Claude Code and resulted in a modest performance improvement
while maintaining correctness.

Branch: Mamba2SSD

Signed-off-by: Gabe Goodhart <redacted>
* test: Add test-backend-ops perf tests for SSM_CONV

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* test: Real representitive tests for SSM_CONV

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* refactor: Use function constant for ssm_conv batch size

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* test: backend op tests for ssm_scan from granite4 1b-h

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* style: remove commented out templates

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* feat: float4 version of ssm_conv_batched

Branch: SSMKernelImprovements

Signed-off-by: Gabe Goodhart <redacted>
* fix: Add missing ggml_metal_cv_free

Signed-off-by: Gabe Goodhart <redacted>
Co-authored-by: Georgi Gerganov <redacted>
---------

Signed-off-by: Gabe Goodhart <redacted>
Co-authored-by: Georgi Gerganov <redacted>
ggml/src/ggml-metal/ggml-metal-device.cpp
ggml/src/ggml-metal/ggml-metal-device.h
ggml/src/ggml-metal/ggml-metal-impl.h
ggml/src/ggml-metal/ggml-metal-ops.cpp
ggml/src/ggml-metal/ggml-metal.metal
tests/test-backend-ops.cpp