git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Johannes Gäßler <redacted>
	Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)
committer	GitHub <redacted>
	Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)
commit	e11bd856d538e44d24d8cad4b0381fba0984d162
tree	36e1fc06f4a8b05f8c3644e9c797e8f4855b11e0	tree
parent	8f824ffe8ee1feadd14428f1dda1283fa3b933be	commit \| diff

CPU/CUDA: Gemma 2 FlashAttention support (#8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

12 files changed:

Packaging of ggml-org/llama.cpp

RSS Atom

ggml/include/ggml.h		diff \| blob \| history
ggml/src/ggml-cuda/fattn-common.cuh		diff \| blob \| history
ggml/src/ggml-cuda/fattn-tile-f16.cu		diff \| blob \| history
ggml/src/ggml-cuda/fattn-tile-f32.cu		diff \| blob \| history
ggml/src/ggml-cuda/fattn-vec-f16.cuh		diff \| blob \| history
ggml/src/ggml-cuda/fattn-vec-f32.cuh		diff \| blob \| history
ggml/src/ggml-cuda/fattn-wmma-f16.cuh		diff \| blob \| history
ggml/src/ggml-cuda/fattn.cu		diff \| blob \| history
ggml/src/ggml-metal.m		diff \| blob \| history
ggml/src/ggml.c		diff \| blob \| history
src/llama.cpp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history