]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: add attention sinks for tile and wmma (llama/15178)
authorAman Gupta <redacted>
Sat, 9 Aug 2025 12:00:24 +0000 (20:00 +0800)
committerGeorgi Gerganov <redacted>
Thu, 14 Aug 2025 11:17:28 +0000 (14:17 +0300)
commit018cc7281ef4ac3f68cfab2721538aa0b28b1cda
treef04f763a68587f412b9441e2d2e2cd8171468b2d
parenta38c4bc4109aae4a203f062f3a4cb36317de1b57
CUDA: add attention sinks for tile and wmma (llama/15178)

* CUDA: add attention sinks for tile and wmma

* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma
src/ggml-cuda/fattn-tile-f16.cu
src/ggml-cuda/fattn-tile-f32.cu
src/ggml-cuda/fattn-wmma-f16.cu
src/ggml-cuda/fattn.cu