]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: use mma PTX instructions for FlashAttention (llama/11583)
authorJohannes Gäßler <redacted>
Sun, 2 Feb 2025 18:31:09 +0000 (19:31 +0100)
committerGeorgi Gerganov <redacted>
Mon, 3 Feb 2025 12:44:49 +0000 (14:44 +0200)
commit0f36dae8a7cb3e1b97225cd71d32142a3fbe1ce5
treed52465fa5c7a95fa3adf0cec139026e5ffc5c172
parent0362e2ba39cd8ef180334e33e4c430599d399fb4
CUDA: use mma PTX instructions for FlashAttention (llama/11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <redacted>
28 files changed:
include/ggml.h
src/ggml-cuda/CMakeLists.txt
src/ggml-cuda/common.cuh
src/ggml-cuda/fattn-common.cuh
src/ggml-cuda/fattn-mma-f16.cuh [new file with mode: 0644]
src/ggml-cuda/fattn-tile-f16.cu
src/ggml-cuda/fattn-tile-f32.cu
src/ggml-cuda/fattn-vec-f16.cuh
src/ggml-cuda/fattn-vec-f32.cuh
src/ggml-cuda/fattn-wmma-f16.cu [new file with mode: 0644]
src/ggml-cuda/fattn-wmma-f16.cuh
src/ggml-cuda/fattn.cu
src/ggml-cuda/mma.cuh
src/ggml-cuda/mmq.cu
src/ggml-cuda/mmq.cuh
src/ggml-cuda/template-instances/fattn-mma-f16-instance-cpb16.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-mma-f16-instance-cpb32.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-mma-f16-instance-cpb64.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-mma-f16-instance-cpb8.cu [new file with mode: 0644]
src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cu [deleted file]
src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cu [deleted file]
src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cu [deleted file]
src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cu [deleted file]
src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cu [deleted file]
src/ggml-cuda/template-instances/generate_cu_files.py
src/ggml-cuda/vendors/hip.h
src/ggml-hip/CMakeLists.txt
src/ggml-musa/CMakeLists.txt