]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: use async data loading for FlashAttention (#11894)
authorJohannes Gäßler <redacted>
Mon, 17 Feb 2025 13:03:24 +0000 (14:03 +0100)
committerGitHub <redacted>
Mon, 17 Feb 2025 13:03:24 +0000 (14:03 +0100)
commit73e2ed3ce3492d3ed70193dd09ae8aa44779651d
tree59ddbdf5022dbe7a87216b88b9e9862433fc8ddb
parentf7b1116af102bcac450c1a522e9c59db241c6767
CUDA: use async data loading for FlashAttention (#11894)

* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <redacted>
ggml/src/ggml-cuda/common.cuh
ggml/src/ggml-cuda/cp-async.cuh [new file with mode: 0644]
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/mma.cuh
ggml/src/ggml-cuda/mmq.cuh