]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: use async data loading for FlashAttention (llama/11894)
authorJohannes Gäßler <redacted>
Mon, 17 Feb 2025 13:03:24 +0000 (14:03 +0100)
committerGeorgi Gerganov <redacted>
Thu, 27 Feb 2025 06:55:36 +0000 (08:55 +0200)
commit51a3580c7931590182fa5ea01eacc7c8b0a8ddb9
tree8185849a0857256776c912931155d29ba9797646
parent37a21dd43d3963fb41aa114595a92b5a5d054381
CUDA: use async data loading for FlashAttention (llama/11894)

* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <redacted>
ggml/src/ggml-cuda/common.cuh
ggml/src/ggml-cuda/cp-async.cuh [new file with mode: 0644]
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/mma.cuh
ggml/src/ggml-cuda/mmq.cuh