]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: add attention sinks for tile and wmma (llama/15178)
authorAman Gupta <redacted>
Sat, 9 Aug 2025 12:00:24 +0000 (20:00 +0800)
committerGeorgi Gerganov <redacted>
Mon, 18 Aug 2025 17:30:45 +0000 (20:30 +0300)
commit93c7a08019ba3f5ff6addf71dcbcd5b2109481ab
tree578ab97ef5dfe39baf0f7bb7abefa0ae4b440d47
parent62566a54365795cd509d8075cd2ea706d491d72f
CUDA: add attention sinks for tile and wmma (llama/15178)

* CUDA: add attention sinks for tile and wmma

* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma
ggml/src/ggml-cuda/fattn-tile-f16.cu
ggml/src/ggml-cuda/fattn-tile-f32.cu
ggml/src/ggml-cuda/fattn-wmma-f16.cu
ggml/src/ggml-cuda/fattn.cu