]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : add support for StarCoder model architectures (#3187)
authorMeng Zhang <redacted>
Fri, 15 Sep 2023 19:02:13 +0000 (03:02 +0800)
committerGitHub <redacted>
Fri, 15 Sep 2023 19:02:13 +0000 (22:02 +0300)
commit4fe09dfe665c58a753dc9eb638dd4dca1cd35488
tree8bde812820738105894d6c179c3b3615b5c06481
parent80291a1d02a07f7f66666fb576c5b1e75aa48b46
llama : add support for StarCoder model architectures (#3187)

* add placeholder of starcoder in gguf / llama.cpp

* support convert starcoder weights to gguf

* convert MQA to MHA

* fix ffn_down name

* add LLM_ARCH_STARCODER to llama.cpp

* set head_count_kv = 1

* load starcoder weight

* add max_position_embeddings

* set n_positions to max_positioin_embeddings

* properly load all starcoder params

* fix head count kv

* fix comments

* fix vram calculation for starcoder

* store mqa directly

* add input embeddings handling

* add TBD

* working in cpu, metal buggy

* cleanup useless code

* metal : fix out-of-bounds access in soft_max kernels

* llama : make starcoder graph build more consistent with others

* refactor: cleanup comments a bit

* add other starcoder models: 3B, 7B, 15B

* support-mqa-directly

* fix: remove max_position_embeddings, use n_train_ctx

* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <redacted>
* fix: switch to space from tab

---------

Co-authored-by: Georgi Gerganov <redacted>
convert-starcoder-hf-to-gguf.py [new file with mode: 0755]
gguf-py/gguf/gguf.py
llama.cpp