]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : grouped-query attention + LLaMAv2 70B support (#2276)
authorGeorgi Gerganov <redacted>
Sun, 23 Jul 2023 12:09:47 +0000 (15:09 +0300)
committerGitHub <redacted>
Sun, 23 Jul 2023 12:09:47 +0000 (15:09 +0300)
commite76d630df17e235e6b9ef416c45996765d2e36fb
tree15e0e9648f9b0e398b43e888216a73f84098ff3a
parent1d0824b2476e7fda09751a0235c9e571b76d6f2c
llama : grouped-query attention + LLaMAv2 70B support (#2276)

* CUDA: GQA implementation

* llama : support for GQA and LLaMAv2 70B

ggml-ci

* py : fix hparams parsing (if-else blocks)

ggml-ci

* py : oh boy ..

ggml-ci

* help : fix gqa value for 70B

ggml-ci

---------

Co-authored-by: JohannesGaessler <redacted>
convert.py
examples/common.cpp
examples/common.h
examples/main/main.cpp
ggml-cuda.cu
llama.cpp
llama.h