]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
convert : support Qwen3.5/Qwen3.5 Moe NVFP4 and add input scales (#20505)
authorMichael Wand <redacted>
Thu, 26 Mar 2026 15:52:06 +0000 (08:52 -0700)
committerGitHub <redacted>
Thu, 26 Mar 2026 15:52:06 +0000 (16:52 +0100)
commitf8d4abae86740bed849c1d2a664dc4f56e35ff0a
tree68f8a36beece4d8b8e380c813a31d867cccbeb21
parent3d5acab3e774c3d30748d1e60093f19f0c80506e
convert : support Qwen3.5/Qwen3.5 Moe NVFP4 and add input scales (#20505)

* convert : fix Qwen3.5 NVFP4 conversion

* Updated copilot concerns and rebased

* move into _LinearAttentionVReorderBase and simplify

* --flake

* new_name not needed

* Added input_scale to gguf

* Fixed input_scale addition as tensor

* Added input scale to loader and named _in_s

* Update convert_hf_to_gguf.py

Re-removed input_scale from aux cleanup

Co-authored-by: Sigbjørn Skjæret <redacted>
---------

Co-authored-by: Sigbjørn Skjæret <redacted>
convert_hf_to_gguf.py
src/llama-model.cpp
src/llama-model.h