]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CPU/CUDA: Gemma 2 FlashAttention support (#8542)
authorJohannes Gäßler <redacted>
Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)
committerGitHub <redacted>
Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)
commite11bd856d538e44d24d8cad4b0381fba0984d162
tree36e1fc06f4a8b05f8c3644e9c797e8f4855b11e0
parent8f824ffe8ee1feadd14428f1dda1283fa3b933be
CPU/CUDA: Gemma 2 FlashAttention support (#8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check
12 files changed:
ggml/include/ggml.h
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-tile-f16.cu
ggml/src/ggml-cuda/fattn-tile-f32.cu
ggml/src/ggml-cuda/fattn-vec-f16.cuh
ggml/src/ggml-cuda/fattn-vec-f32.cuh
ggml/src/ggml-cuda/fattn-wmma-f16.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-metal.m
ggml/src/ggml.c
src/llama.cpp
tests/test-backend-ops.cpp