* feat(gguf-py): Add Granite model and params to gguf-py
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <redacted>
* feat(convert_hf_to_gguf): Add registration and param setup for Granite
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <redacted>
* feat(llama.cpp): Add config parsing for Granite multiplier params
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <redacted>
* feat(llama.cpp): First pass at full port of granite deviations from llama
Something is still not working right since the results are mostly terrible,
but on occasion it's producing relevant results at this point, so
_something_ is working.
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <redacted>
* fix(llama.cpp): Determine granite language 3b instruct by vocab size
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <redacted>
* fix(convert_hf_to_gguf): Use LlamaModel as base for GraniteModel
The defaults in LlamaModel are needed for Granite as well
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <redacted>
* fix(llama.cpp): Switch Granite param names to use _scale for consistency
Other scalar multipliers are called *_scale, so this provides a more
consistent naming convention.
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <redacted>
* fix(convert_hf_to_gguf/gguf-py): _multiplier -> _scale
The transformers names with _multiplier will now be converted to the _scale
equivalent during conversion.
Branch: GraniteLM
Signed-off-by: Gabe Goodhart <redacted>
* fix(llama.cpp): Use separate switch clause for granite in llm_load_hparams