]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2...
authorAndrei <redacted>
Sun, 30 Jun 2024 03:44:08 +0000 (20:44 -0700)
committerGitHub <redacted>
Sun, 30 Jun 2024 03:44:08 +0000 (23:44 -0400)
commit1c5eba6f8e628fb0a98afb27d8aaeb3b0e136451
treec681e5cd5c59b58a435684263a26549394c99c7e
parent72272b83a3878e91251218c981b4c6ec16c33912
llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197)

* Add attention and final logit softcapping.

* fix

* Add custom add_ functions

* Disable flash attention for Gemma2

* Update src/llama.cpp

Co-authored-by: slaren <redacted>
* Add default value for attention and final logit softcap value

* Add custom kq scaling from Gemma2Attention

* Remove custom pre attention scaling and use computed value instead.

---------

Co-authored-by: slaren <redacted>
convert-hf-to-gguf.py
gguf-py/gguf/constants.py
gguf-py/gguf/gguf_writer.py
src/llama.cpp