git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Meng Zhang <redacted>
	Fri, 15 Sep 2023 19:02:13 +0000 (03:02 +0800)
committer	GitHub <redacted>
	Fri, 15 Sep 2023 19:02:13 +0000 (22:02 +0300)
commit	4fe09dfe665c58a753dc9eb638dd4dca1cd35488
tree	8bde812820738105894d6c179c3b3615b5c06481	tree
parent	80291a1d02a07f7f66666fb576c5b1e75aa48b46	commit \| diff

llama : add support for StarCoder model architectures (#3187)

* add placeholder of starcoder in gguf / llama.cpp

* support convert starcoder weights to gguf

* convert MQA to MHA

* fix ffn_down name

* add LLM_ARCH_STARCODER to llama.cpp

* set head_count_kv = 1

* load starcoder weight

* add max_position_embeddings

* set n_positions to max_positioin_embeddings

* properly load all starcoder params

* fix head count kv

* fix comments

* fix vram calculation for starcoder

* store mqa directly

* add input embeddings handling

* add TBD

* working in cpu, metal buggy

* cleanup useless code

* metal : fix out-of-bounds access in soft_max kernels

* llama : make starcoder graph build more consistent with others

* refactor: cleanup comments a bit

* add other starcoder models: 3B, 7B, 15B

* support-mqa-directly

* fix: remove max_position_embeddings, use n_train_ctx

* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Update llama.cpp

Co-authored-by: Georgi Gerganov <redacted>
* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <redacted>
* fix: switch to space from tab

---------

Co-authored-by: Georgi Gerganov <redacted>

convert-starcoder-hf-to-gguf.py	[new file with mode: 0755]	blob
gguf-py/gguf/gguf.py		diff \| blob \| history
llama.cpp		diff \| blob \| history