]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: use mma PTX instructions for FlashAttention (#11583)
authorJohannes Gäßler <redacted>
Sun, 2 Feb 2025 18:31:09 +0000 (19:31 +0100)
committerGitHub <redacted>
Sun, 2 Feb 2025 18:31:09 +0000 (19:31 +0100)
commit864a0b67a6c8f648c43ce8271f9cb2e12dd5df6e
tree859a4b555860fa62784c4c72d872fe3b8863d843
parent84ec8a58f7b6aad6887bbfbd1321f3ff417341a5
CUDA: use mma PTX instructions for FlashAttention (#11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <redacted>
29 files changed:
Makefile
ggml/include/ggml.h
ggml/src/ggml-cuda/CMakeLists.txt
ggml/src/ggml-cuda/common.cuh
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-mma-f16.cuh [new file with mode: 0644]
ggml/src/ggml-cuda/fattn-tile-f16.cu
ggml/src/ggml-cuda/fattn-tile-f32.cu
ggml/src/ggml-cuda/fattn-vec-f16.cuh
ggml/src/ggml-cuda/fattn-vec-f32.cuh
ggml/src/ggml-cuda/fattn-wmma-f16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/fattn-wmma-f16.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-cuda/mma.cuh
ggml/src/ggml-cuda/mmq.cu
ggml/src/ggml-cuda/mmq.cuh
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-cpb16.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-cpb32.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-cpb64.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-mma-f16-instance-cpb8.cu [new file with mode: 0644]
ggml/src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb16.cu [deleted file]
ggml/src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqfloat-cpb32.cu [deleted file]
ggml/src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb16.cu [deleted file]
ggml/src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb32.cu [deleted file]
ggml/src/ggml-cuda/template-instances/fattn-wmma-f16-instance-kqhalf-cpb8.cu [deleted file]
ggml/src/ggml-cuda/template-instances/generate_cu_files.py
ggml/src/ggml-cuda/vendors/hip.h
ggml/src/ggml-hip/CMakeLists.txt
ggml/src/ggml-musa/CMakeLists.txt