git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	postmasters <redacted>
	Tue, 2 Jan 2024 11:51:28 +0000 (03:51 -0800)
committer	GitHub <redacted>
	Tue, 2 Jan 2024 11:51:28 +0000 (13:51 +0200)
commit	83e633c27efdf0eb0ba54249e784b0ea760b1007
tree	30711187d9551899c546f9181f00456481873679	tree
parent	32866c5edde402f42ff4233bb89dcfcede34fd22	commit \| diff

llama : differentiate the KV dims in the attention (#4657)

* Add n_key_dim and n_value_dim

Some models use values that are not derived from `n_embd`.
Also remove `n_embd_head` and `n_embd_gqa` because it is not clear
which "head" is referred to (key or value).

Fix issue #4648.

* Fix `llm_build_kqv` to use `n_value_gqa`

* Rebase

* Rename variables

* Fix llm_build_kqv to be more generic wrt n_embd_head_k

* Update default values for n_embd_head_k and n_embd_head_v

Co-authored-by: Georgi Gerganov <redacted>
* Fix llm_load_tensors: the asserts were not backcompat

---------

Co-authored-by: Georgi Gerganov <redacted>

gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/gguf_writer.py		diff \| blob \| history
llama.cpp		diff \| blob \| history