]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: deduplicate FlashAttention code (#7352)
authorJohannes Gäßler <redacted>
Sat, 18 May 2024 10:36:25 +0000 (12:36 +0200)
committerGitHub <redacted>
Sat, 18 May 2024 10:36:25 +0000 (12:36 +0200)
commit133d99c59980139f5bb75922c8b5fca67d7ba9b8
tree6a84c2c449dcd23e909db087b4594444b8622c71
parentcb42c294279bc4a0a4e926a7b5a5568049f12fa7
CUDA: deduplicate FlashAttention code (#7352)
ggml-cuda/common.cuh
ggml-cuda/fattn-common.cuh
ggml-cuda/fattn-tile-f16.cu
ggml-cuda/fattn-tile-f32.cu
ggml-cuda/fattn-vec-f16.cu
ggml-cuda/fattn-vec-f32.cu
ggml-cuda/fattn.cu
ggml-cuda/softmax.cu