model : add support for Falcon-H1 family (#14534)
* v1
* push more fixes
* another fix
* fix
* more fixes
* minor fix
* more cleaning on python code
* python fixes
* changed precision for multipliers float 32->64
* fixes
* another fix
* fix
* pre-norm -> norm
* fix
* Revert "fix"
This reverts commit
243e4d1a50bd73467d99f6b289b9a1826f83b94b.
* fix
* small fix ffn_norm
* try
* mix instead of max
* fix vocab size
* conflict solve
* fixed multipliers
* falcon-h1 specefic vocab resolved
* read arch from gguf.MODEL_ARCH
* mamba_d_ssm added to d_inner find_hparam
* remove unused functions from gguf_writer.py
* override modify_tensors instead of get_tensors
* fix conversion and d_inner
* added some cb functions for debugging puposes
* inp_out_ids moved outside of layers loop
* mup_vec create as float64
* fix rope_theta
* injected mup
* clean ups
* rm extra space
* rm unused MAMBA_CHUNK_SIZE
* rm unused key
* add bos False
* changed ROPE_TYPE
* cleaning debugging stuff
* cleaning debug quant
* fix comment
* some cleanups
* some cleanups
* Update src/llama-model-loader.cpp
* more cleanups
* moe cleanuips
* d_ssm -> d_inner;
* cleaning unused hparams
* cleanup
* more cleanups
* more cleanups on python conversion;
* minor cleanups
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <redacted>
* remove todo
* added falcon-h1
* tensor not required
* clean
* remove unneeded attributes
* more cleanups and fixed conversion
* remove final_norm
* flake8 fixes
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <redacted>
* flake8 fixes
* Update src/llama-hparams.cpp
Co-authored-by: Sigbjørn Skjæret <redacted>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <redacted>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <redacted>
* Update src/llama-arch.cpp
Co-authored-by: Sigbjørn Skjæret <redacted>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <redacted>
* added hashes
* Update src/llama-arch.cpp
Co-authored-by: Georgi Gerganov <redacted>
* Update src/llama-vocab.cpp
Co-authored-by: Georgi Gerganov <redacted>
* update the update file
* Revert "update the update file"
This reverts commit
082ab4ad2a3927384d878666a5f8cae4eb15f577.
* fix: address suggestions
* fix: update convert_hf_to_gguf.py
* Update gguf-py/gguf/constants.py
Co-authored-by: Sigbjørn Skjæret <redacted>
* Update src/llama-model-loader.cpp
Co-authored-by: Sigbjørn Skjæret <redacted>
* d_inner fixed
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <redacted>
* reshaping ssm_norm for 34B
* removing generate_mup
* remove duplicates metadata keys
* rm comment
* final comment
* fix unused args
* fix constants
* fix bad merge
* Update src/llama-model.cpp
Co-authored-by: compilade <redacted>
* falcon-h1: remove unused ssm_in_b and bad merge
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <redacted>
* falcon-h1: fix last comment
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <redacted>
* falcon-h1: revert add_add_bos(False)
* falcon-h1: fix tied weights
* falcon-h1: remove whitespace
* falcon-h1: fix wrong size param
* falcon-h1: fix whitespace issues
---------
Co-authored-by: younesbelkada <redacted>
Co-authored-by: Younes B <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Sigbjørn Skjæret <redacted>
Co-authored-by: compilade <redacted>