]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784)
authorXuan-Son Nguyen <redacted>
Tue, 27 May 2025 12:06:10 +0000 (14:06 +0200)
committerGitHub <redacted>
Tue, 27 May 2025 12:06:10 +0000 (14:06 +0200)
commitbc583e3c63c04a11d287c108ea9e6a515ead0423
treeb8533edcfe808a53f288667164e7c28d7bd9fb6e
parent72b090da2c50e540143fd312a2f9aa5f151e6136
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784)

* mtmd : allow multiple modalities at the same time

* refactor mtmd tokenizer

* fix compile

* ok, missing SinusoidsPositionEmbedding

* first working version

* fix style

* more strict validate of n_embd

* refactor if..else to switch

* fix regression

* add test for 3B

* update docs

* fix tokenizing with add_special

* add more tests

* fix test case "huge"

* rm redundant code

* set_position_mrope_1d rm n_tokens
12 files changed:
convert_hf_to_gguf.py
docs/multimodal.md
gguf-py/gguf/constants.py
gguf-py/gguf/tensor_mapping.py
tools/mtmd/clip-impl.h
tools/mtmd/clip.cpp
tools/mtmd/clip.h
tools/mtmd/mtmd-cli.cpp
tools/mtmd/mtmd-helper.cpp
tools/mtmd/mtmd.cpp
tools/mtmd/test-2.mp3 [new file with mode: 0644]
tools/mtmd/tests.sh