model: add support for qwen3vl series (llama/16780)
* support qwen3vl series.
Co-authored-by: Thireus ☠ <redacted>
Co-authored-by: yairpatch <redacted>
Co-authored-by: LETS-BEE <redacted>
* bugfix: fix the arch check for qwen3vl-moe.
* use build_ffn
* optimize deepstack structure
* optimize deepstack feature saving
* Revert "optimize deepstack feature saving" for temporal fix
This reverts commit
f321b9fdf13e59527408152e73b1071e19a87e71.
* code clean
* use fused qkv in clip
* clean up / rm is_deepstack_layers for simplification
* add test model
* move test model to "big" section
* fix imrope check
* remove trailing whitespace
* fix rope fail
* metal : add imrope support
* add imrope support for sycl
* vulkan: add imrope w/o check
* fix vulkan
* webgpu: add imrope w/o check
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <redacted>
* fix tensor mapping
---------
Co-authored-by: Thireus ☠ <redacted>
Co-authored-by: yairpatch <redacted>
Co-authored-by: LETS-BEE <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Sigbjørn Skjæret <redacted>