git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Carolinabanana <redacted>
	Tue, 9 Apr 2024 08:16:13 +0000 (09:16 +0100)
committer	GitHub <redacted>
	Tue, 9 Apr 2024 08:16:13 +0000 (11:16 +0300)
commit	5dc9dd7152dedc6046b646855585bd070c91e8c8
tree	d2bae3652d91cdd9327e28fa85d167a67e050c53	tree
parent	e11a8999b5690f810c2c99c14347f0834e68c524	commit \| diff

llama : add Command R Plus support (#6491)

* Add Command R Plus GGUF

* Add Command R Plus GGUF

* Loading works up to LayerNorm2D

* Export new tensors in 1D so they are not quantized.

* Fix embedding layer based on Noeda's example

* Whitespace

* Add line

* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)

* dranger003: Fix block index overflow in CUDA dequantizing.

* Reverted blocked multiplication code as it still has issues and could affect other Llama arches

* export norms as f32

* fix overflow issues during quant and other cleanup

* Type convention

Co-authored-by: Georgi Gerganov <redacted>
* dranger003: Fix more int overflow during quant.

---------

Co-authored-by: S <redacted>
Co-authored-by: S <redacted>
Co-authored-by: slaren <redacted>
Co-authored-by: Georgi Gerganov <redacted>

convert-hf-to-gguf.py		diff \| blob \| history
ggml-cuda.cu		diff \| blob \| history
ggml-cuda/common.cuh		diff \| blob \| history
ggml-cuda/convert.cu		diff \| blob \| history
ggml-cuda/convert.cuh		diff \| blob \| history
ggml-cuda/dequantize.cuh		diff \| blob \| history
ggml-cuda/dmmv.cu		diff \| blob \| history
ggml-cuda/quantize.cu		diff \| blob \| history
ggml-cuda/quantize.cuh		diff \| blob \| history
ggml-quants.c		diff \| blob \| history
ggml-quants.h		diff \| blob \| history
ggml.c		diff \| blob \| history
ggml.h		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/tensor_mapping.py		diff \| blob \| history
llama.cpp		diff \| blob \| history