]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : differentiate the KV dims in the attention (#4657)
authorpostmasters <redacted>
Tue, 2 Jan 2024 11:51:28 +0000 (03:51 -0800)
committerGitHub <redacted>
Tue, 2 Jan 2024 11:51:28 +0000 (13:51 +0200)
commit83e633c27efdf0eb0ba54249e784b0ea760b1007
tree30711187d9551899c546f9181f00456481873679
parent32866c5edde402f42ff4233bb89dcfcede34fd22
llama : differentiate the KV dims in the attention (#4657)

* Add n_key_dim and n_value_dim

Some models use values that are not derived from `n_embd`.
Also remove `n_embd_head` and `n_embd_gqa` because it is not clear
which "head" is referred to (key or value).

Fix issue #4648.

* Fix `llm_build_kqv` to use `n_value_gqa`

* Rebase

* Rename variables

* Fix llm_build_kqv to be more generic wrt n_embd_head_k

* Update default values for n_embd_head_k and n_embd_head_v

Co-authored-by: Georgi Gerganov <redacted>
* Fix llm_load_tensors: the asserts were not backcompat

---------

Co-authored-by: Georgi Gerganov <redacted>
gguf-py/gguf/constants.py
gguf-py/gguf/gguf_writer.py
llama.cpp