]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
llama : add Command R Plus support (llama/6491)
authorCarolinabanana <redacted>
Tue, 9 Apr 2024 08:16:13 +0000 (09:16 +0100)
committerGeorgi Gerganov <redacted>
Tue, 9 Apr 2024 17:16:09 +0000 (20:16 +0300)
commit4c3ba529aca55889d67a79037733ef0d18cd14aa
tree18283ebc86fccd577ddf43e2fbf1b962b206e710
parenteaf47080d3a89e28d0c97a16b81b7b4dcc3ddf11
llama : add Command R Plus support (llama/6491)

* Add Command R Plus GGUF

* Add Command R Plus GGUF

* Loading works up to LayerNorm2D

* Export new tensors in 1D so they are not quantized.

* Fix embedding layer based on Noeda's example

* Whitespace

* Add line

* Fix unexpected tokens on MPS. Re-add F16 fix. ((Noeda)

* dranger003: Fix block index overflow in CUDA dequantizing.

* Reverted blocked multiplication code as it still has issues and could affect other Llama arches

* export norms as f32

* fix overflow issues during quant and other cleanup

* Type convention

Co-authored-by: Georgi Gerganov <redacted>
* dranger003: Fix more int overflow during quant.

---------

Co-authored-by: S <redacted>
Co-authored-by: S <redacted>
Co-authored-by: slaren <redacted>
Co-authored-by: Georgi Gerganov <redacted>
12 files changed:
include/ggml/ggml.h
src/ggml-cuda.cu
src/ggml-cuda/common.cuh
src/ggml-cuda/convert.cu
src/ggml-cuda/convert.cuh
src/ggml-cuda/dequantize.cuh
src/ggml-cuda/dmmv.cu
src/ggml-cuda/quantize.cu
src/ggml-cuda/quantize.cuh
src/ggml-quants.c
src/ggml-quants.h
src/ggml.c