]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: add FP32 FlashAttention vector kernel (llama/7188)
authorJohannes Gäßler <redacted>
Sun, 12 May 2024 17:40:45 +0000 (19:40 +0200)
committerGeorgi Gerganov <redacted>
Tue, 14 May 2024 16:13:20 +0000 (19:13 +0300)
commite84c498bbefe76bd53c4f94ed44a2d2edc6a40ad
tree31e907137aa33262bb523c56a2bf216d63d89e63
parentacc32bbc40c39def8ed740321c0a0acee62f54b0
CUDA: add FP32 FlashAttention vector kernel (llama/7188)

* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
src/ggml-cuda.cu
src/ggml-cuda/common.cuh
src/ggml-cuda/fattn-common.cuh [new file with mode: 0644]
src/ggml-cuda/fattn-vec-f16.cu [new file with mode: 0644]
src/ggml-cuda/fattn-vec-f16.cuh [new file with mode: 0644]
src/ggml-cuda/fattn-vec-f32.cu [new file with mode: 0644]
src/ggml-cuda/fattn-vec-f32.cuh [new file with mode: 0644]
src/ggml-cuda/fattn.cu
tests/test-backend-ops.cpp