]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: broadcasting for FlashAttention mask (llama/14500)
authorJohannes Gäßler <redacted>
Wed, 2 Jul 2025 11:42:12 +0000 (13:42 +0200)
committerGeorgi Gerganov <redacted>
Sat, 12 Jul 2025 13:05:00 +0000 (16:05 +0300)
commita2bacc13e578a4a5aa6fa3c11312d513ca8d8d16
tree1893c285fa0081f54ec6eda912c186bd700528d0
parent7e2e170e8c23a9f69a1ce0c2c24ca08695cf8537
CUDA: broadcasting for FlashAttention mask (llama/14500)
src/ggml-cuda/fattn-common.cuh
src/ggml-cuda/fattn-mma-f16.cuh
src/ggml-cuda/fattn-tile-f16.cu
src/ggml-cuda/fattn-tile-f32.cu
src/ggml-cuda/fattn-vec-f16.cuh
src/ggml-cuda/fattn-vec-f32.cuh
src/ggml-cuda/fattn-wmma-f16.cu