]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
authorAman Gupta <redacted>
Tue, 9 Sep 2025 06:38:02 +0000 (14:38 +0800)
committerGitHub <redacted>
Tue, 9 Sep 2025 06:38:02 +0000 (14:38 +0800)
commita972faebed5fdc4a3d2a844d92d476058c02e02d
treef83f43de492a75867dbc80c2fe9e1acba2cb17d2
parent550cf726e133fd0a069d991287fd3a2a3e3e1cbd
CUDA: Add mul_mat_id support for the mmf kernel (#15767)

* CUDA: Add mul_mat_id support the mmf

Add support for mul_mat_id for bs < 16

* Review: use warp_size, fix should_use_mmf condition

* Launch one block per expert, stride along n_expert_used

* templatize mul_mat_id

* Pad shmem to 16 bytes, add helper function mul_mat_f_switch_ids

* Reduce compile times by dividing mmf into f16, bf16 and f32 variants

* Divide mmf by ncols_dst

* Add missing files

* Fix MUSA/HIP builds
23 files changed:
ggml/src/ggml-cuda/CMakeLists.txt
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/mma.cuh
ggml/src/ggml-cuda/mmf.cu
ggml/src/ggml-cuda/mmf.cuh
ggml/src/ggml-cuda/template-instances/generate_cu_files.py
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_1.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_10.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_11.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_12.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_13.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_14.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_15.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_2.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_3.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_4.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_5.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_6.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_7.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_8.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/mmf-instance-ncols_9.cu [new file with mode: 0644]
tests/test-backend-ops.cpp