]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
whisper : use flash attention (#2152)
authorGeorgi Gerganov <redacted>
Wed, 15 May 2024 06:38:19 +0000 (09:38 +0300)
committerGitHub <redacted>
Wed, 15 May 2024 06:38:19 +0000 (09:38 +0300)
commit7094ea5e750266e16c16c7aecac8fc03294ecaa3
tree1166f219a2d57b2da63273ab840e9c4701c28a84
parent9d5771ae43d7fc7cca9d31dd924b13a29144e476
whisper : use flash attention (#2152)

* whisper : use flash attention in the encoder

* whisper : add kv_pad

* whisper : remove extra backend instance (huh?)

* whisper : use FA for cross-attention

* whisper : use FA for self-attention

* whisper : simplify encoder FA

* whisper : add flash_attn runtime parameter

* scripts : add bench log

* scripts : add M1 Pro bench log
13 files changed:
examples/bench/bench.cpp
examples/command/command.cpp
examples/lsp/lsp.cpp
examples/main/main.cpp
examples/server/server.cpp
examples/stream/stream.cpp
examples/talk-llama/talk-llama.cpp
examples/talk/talk.cpp
examples/wchess/wchess.cmd/wchess.cmd.cpp
scripts/bench-all-gg.txt [new file with mode: 0644]
scripts/bench-all.sh
whisper.cpp
whisper.h