]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: attention sinks for mma FlashAttention (llama/15157)
authorJohannes Gäßler <redacted>
Fri, 8 Aug 2025 06:19:58 +0000 (08:19 +0200)
committerGeorgi Gerganov <redacted>
Thu, 14 Aug 2025 11:17:28 +0000 (14:17 +0300)
commit43dcc4c5f831d89c4ec96745c6b7b6f010601f3e
treed83c3f717b5dca3992b85148f2f92277ae6f7e8a
parente283898fa96368900d732d7e236fcb26fdd73237
CUDA: attention sinks for mma FlashAttention (llama/15157)
src/ggml-cuda/fattn-mma-f16.cuh
src/ggml-cuda/fattn.cu
src/ggml-cuda/ggml-cuda.cu