git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Younes Belkada <redacted>
	Wed, 21 Aug 2024 08:06:36 +0000 (12:06 +0400)
committer	GitHub <redacted>
	Wed, 21 Aug 2024 08:06:36 +0000 (11:06 +0300)
commit	b40eb84895bf723c7b327a1e3bf6e0e2c41877f8
tree	5c803992fcb116cff6bcb8cc1e6e398524028dbf	tree
parent	f63f603c879c2232eaeded8c0aeba4244471d720	commit \| diff

llama : support for `falcon-mamba` architecture (#9074)

* feat: initial support for llama.cpp

* fix: lint

* refactor: better refactor

* Update src/llama.cpp

Co-authored-by: compilade <redacted>
* Update src/llama.cpp

Co-authored-by: compilade <redacted>
* fix: address comments

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <redacted>
* fix: add more cleanup and harmonization

* fix: lint

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <redacted>
* fix: change name

* Apply suggestions from code review

Co-authored-by: compilade <redacted>
* add in operator

* fix: add `dt_b_c_rms` in `llm_load_print_meta`

* fix: correct printf format for bool

* fix: correct print format

* Update src/llama.cpp

Co-authored-by: compilade <redacted>
* llama : quantize more Mamba tensors

* llama : use f16 as the fallback of fallback quant types

---------

Co-authored-by: compilade <redacted>

README.md		diff \| blob \| history
convert_hf_to_gguf.py		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/gguf_writer.py		diff \| blob \| history
src/llama.cpp		diff \| blob \| history