]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)
authorJohannes Gäßler <redacted>
Fri, 9 May 2025 11:34:58 +0000 (13:34 +0200)
committerGeorgi Gerganov <redacted>
Tue, 13 May 2025 10:02:19 +0000 (13:02 +0300)
commitbdfb7fc0c02b2be2abf05aaec8adc0c0f249b0a3
treeb06d292745c81ec8b39377cb644772951d739256
parentac47d234ec637ba9ccebce8cb9243e28198f0992
CUDA: FA support for Deepseek (Ampere or newer) (llama/13306)

* CUDA: FA support for Deepseek (Ampere or newer)

* do loop unrolling via C++ template
32 files changed:
src/ggml-cuda/CMakeLists.txt
src/ggml-cuda/common.cuh
src/ggml-cuda/cp-async.cuh
src/ggml-cuda/fattn-common.cuh
src/ggml-cuda/fattn-mma-f16.cuh
src/ggml-cuda/fattn-tile-f16.cu
src/ggml-cuda/fattn-tile-f32.cu
src/ggml-cuda/fattn-vec-f16.cuh
src/ggml-cuda/fattn-vec-f32.cuh
src/ggml-cuda/fattn-wmma-f16.cu
src/ggml-cuda/fattn.cu
src/ggml-cuda/ggml-cuda.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_1-ncols2_8.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_1.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_2.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_16-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_2-ncols2_8.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_1.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_32-ncols2_2.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_2.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_4-ncols2_8.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_64-ncols2_1.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_1.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_2.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_4.cu
src/ggml-cuda/template-instances/fattn-mma-f16-instance-ncols1_8-ncols2_8.cu
src/ggml-cuda/template-instances/generate_cu_files.py