]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: add attention sinks for tile and wmma (#15178)
authorAman Gupta <redacted>
Sat, 9 Aug 2025 12:00:24 +0000 (20:00 +0800)
committerGitHub <redacted>
Sat, 9 Aug 2025 12:00:24 +0000 (20:00 +0800)
commit34c9d765bf173c551398f1e7fa4595019bc53bab
tree254d53d51309ff63ba921557c37e0a58920d7914
parente54d41befcc1575f4c898c5ff4ef43970cead75f
CUDA: add attention sinks for tile and wmma (#15178)

* CUDA: add attention sinks for tile and wmma

* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma
ggml/src/ggml-cuda/fattn-tile-f16.cu
ggml/src/ggml-cuda/fattn-tile-f32.cu
ggml/src/ggml-cuda/fattn-wmma-f16.cu
ggml/src/ggml-cuda/fattn.cu