]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: add FP32 FlashAttention vector kernel (llama/7188)
authorJohannes Gäßler <redacted>
Sun, 12 May 2024 17:40:45 +0000 (19:40 +0200)
committerGeorgi Gerganov <redacted>
Tue, 14 May 2024 16:16:29 +0000 (19:16 +0300)
commite57e95eb0d3bdba42bbf057c888f6ff819a5f59b
tree48367eca9a6bb0df8cd7da6f52343a1e92470226
parent130f43e4b87d17ba9d1c68234e26d1180f4bb9a1
CUDA: add FP32 FlashAttention vector kernel (llama/7188)

* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
ggml-cuda.cu
ggml-cuda/common.cuh
ggml-cuda/fattn-common.cuh [new file with mode: 0644]
ggml-cuda/fattn-vec-f16.cu [new file with mode: 0644]
ggml-cuda/fattn-vec-f16.cuh [new file with mode: 0644]
ggml-cuda/fattn-vec-f32.cu [new file with mode: 0644]
ggml-cuda/fattn-vec-f32.cuh [new file with mode: 0644]
ggml-cuda/fattn.cu