]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
authorJohannes Gäßler <redacted>
Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)
committerGeorgi Gerganov <redacted>
Wed, 28 Aug 2024 10:22:20 +0000 (13:22 +0300)
commit24d8534bd8636e2d5ba9e922e286ddf4b5363296
tree151f4d881157d13c57acaf850417c25933fbc2f2
parent9b16ddd3a5094d96ef391fb8205361f6ae13beee
CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check
ggml/include/ggml.h
ggml/src/ggml-cuda/fattn-common.cuh
ggml/src/ggml-cuda/fattn-tile-f16.cu
ggml/src/ggml-cuda/fattn-tile-f32.cu
ggml/src/ggml-cuda/fattn-vec-f16.cuh
ggml/src/ggml-cuda/fattn-vec-f32.cuh
ggml/src/ggml-cuda/fattn-wmma-f16.cuh
ggml/src/ggml-cuda/fattn.cu
ggml/src/ggml-metal.m
ggml/src/ggml.c