]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: deduplicate FlashAttention code (llama/7352)
authorJohannes Gäßler <redacted>
Sat, 18 May 2024 10:36:25 +0000 (12:36 +0200)
committerGeorgi Gerganov <redacted>
Tue, 28 May 2024 11:41:08 +0000 (14:41 +0300)
commitf68d6a9211b2afcf41d48c359819f8441aef5bab
treeed4263768b7e8af2a89b9bcfd3309c4593911dbb
parent0d7036940257c60e2a828896be8073a860dec860
CUDA: deduplicate FlashAttention code (llama/7352)
src/ggml-cuda/common.cuh
src/ggml-cuda/fattn-common.cuh
src/ggml-cuda/fattn-tile-f16.cu
src/ggml-cuda/fattn-tile-f32.cu
src/ggml-cuda/fattn-vec-f16.cu
src/ggml-cuda/fattn-vec-f32.cu
src/ggml-cuda/fattn.cu
src/ggml-cuda/softmax.cu