]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: broadcasting for FlashAttention mask (llama/14500)
authorJohannes Gäßler <redacted>
Wed, 2 Jul 2025 11:42:12 +0000 (13:42 +0200)
committerGeorgi Gerganov <redacted>
Sat, 12 Jul 2025 16:23:56 +0000 (19:23 +0300)
commit70515ed728976d2eb64936417359ded81eaab2bc
treeb3a5117821840c0e703e97eccb0d4dbddba96e37
parent1b3e06a400b3356b6fce528b98601f1257ac40f4
CUDA: broadcasting for FlashAttention mask (llama/14500)
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/fattn-tile-f16.cu
ggml/src/ggml-cuda/fattn-tile-f32.cu
ggml/src/ggml-cuda/fattn-vec-f16.cuh
ggml/src/ggml-cuda/fattn-vec-f32.cuh
ggml/src/ggml-cuda/fattn-wmma-f16.cu