]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: broadcasting for FlashAttention mask (#14500)
authorJohannes Gäßler <redacted>
Wed, 2 Jul 2025 11:42:12 +0000 (13:42 +0200)
committerGeorgi Gerganov <redacted>
Wed, 2 Jul 2025 12:48:33 +0000 (15:48 +0300)
commit12a81af45f0dbbab24bd819a15f57c03ceb1be90
tree6ecceb247d82477c1aad83f76089f4ed9ed92b7d
parent8875523eb311cac832bfda0c581e852292185ae9
CUDA: broadcasting for FlashAttention mask (#14500)
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-mma-f16.cuh
ggml/src/ggml-cuda/fattn-tile-f16.cu
ggml/src/ggml-cuda/fattn-tile-f32.cu
ggml/src/ggml-cuda/fattn-vec-f16.cuh
ggml/src/ggml-cuda/fattn-vec-f32.cuh
ggml/src/ggml-cuda/fattn-wmma-f16.cu