]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llava : Add Granite Vision Support (#11794)
authorAlex Brooks <redacted>
Mon, 24 Feb 2025 16:09:51 +0000 (09:09 -0700)
committerGitHub <redacted>
Mon, 24 Feb 2025 16:09:51 +0000 (17:09 +0100)
commit7a2c913e66353362d7f28d612fd3c9d51a831eda
treee46e848ee84e39ecc9394f2a01e2e866b6d9ba0b
parent08d5986290cc42d2c52739e046642b8252f97e4b
llava : Add Granite Vision Support (#11794)

* Add super wip scripts for multimodal granite gguf

Signed-off-by: Alex-Brooks <redacted>
* Add example for converting mmgranite to gguf

Signed-off-by: Alex-Brooks <redacted>
* remove hardcoded path

Signed-off-by: Alex-Brooks <redacted>
* Add vision feature layer to gguf params

Signed-off-by: Alex-Brooks <redacted>
* Clean up llava surgery and remove name substitution hacks

Signed-off-by: Alex-Brooks <redacted>
* Add transformers llava next tensor name mapping

Signed-off-by: Alex-Brooks <redacted>
* Make siglip / openclip mutuall exclusive

Signed-off-by: Alex-Brooks <redacted>
* Fix projector linear substitution

Signed-off-by: Alex-Brooks <redacted>
* Fix linear 2 substitution index

Signed-off-by: Alex-Brooks <redacted>
* Increase max flattened gridpoints to 64

Signed-off-by: Alex-Brooks <redacted>
* Fix hardcoded concat for multiple feature layers

Signed-off-by: Alex-Brooks <redacted>
* Pull vision feature layers out of gguf keys

Signed-off-by: Alex-Brooks <redacted>
* fix num gridpoints and use all layers

Signed-off-by: Alex-Brooks <redacted>
* Avoid dropping last image encoder layer in llava models

Signed-off-by: Alex-Brooks <redacted>
* Use 10 for max number of patches

Signed-off-by: Alex-Brooks <redacted>
* Standardize vision feature layers

Signed-off-by: Alex-Brooks <redacted>
* Cleanup logs

Signed-off-by: Alex-Brooks <redacted>
* Update comment for vision feature layer init

Signed-off-by: Alex-Brooks <redacted>
* Update notes for alternative to legacy llm conversion script

Signed-off-by: Alex-Brooks <redacted>
* Fix notes rendering

Signed-off-by: Alex-Brooks <redacted>
* Add v prefix to vision feature layer log

Signed-off-by: Alex-Brooks <redacted>
* Use current defaults for feature layer

Signed-off-by: Alex-Brooks <redacted>
* Use constant for max gridpoints / feat layers, style fixes

Signed-off-by: Alex-Brooks <redacted>
* clarify non-negative feature layers

Signed-off-by: Alex-Brooks <redacted>
* Remove CLIP_API from func signature

Signed-off-by: Alex-Brooks <redacted>
* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc

Signed-off-by: Alex-Brooks <redacted>
* Clarify feature layers are non negative ints and not uint

Signed-off-by: Alex-Brooks <redacted>
* Fix condition for reading feature layers

Signed-off-by: Alex-Brooks <redacted>
* pop last llava layer when feature layers are unset

Signed-off-by: Alex-Brooks <redacted>
* Fix unset vision layer 0

Signed-off-by: Alex-Brooks <redacted>
* Update examples/llava/clip.cpp

Co-authored-by: Xuan-Son Nguyen <redacted>
* Reenable assertion for out of bounds get_rows

Signed-off-by: Alex-Brooks <redacted>
* Use std vector for gridpoints and feature layers

Signed-off-by: Alex-Brooks <redacted>
* Caculate max feature layer at load time

Signed-off-by: Alex-Brooks <redacted>
* Include base patch for granite vision allocation

Signed-off-by: Alex-Brooks <redacted>
* Fix trailing whitespace

Signed-off-by: Alex-Brooks <redacted>
* Add max num patches = 10 back for minicpmv

Signed-off-by: Alex-Brooks <redacted>
* Use unordered set to store feature layers

Co-authored-by: Xuan-Son Nguyen <redacted>
Signed-off-by: Alex-Brooks <redacted>
* Use max feature layer for postnorm

Signed-off-by: Alex-Brooks <redacted>
* Apply suggestions from code review

---------

Signed-off-by: Alex-Brooks <redacted>
Co-authored-by: Xuan-Son Nguyen <redacted>
examples/llava/README.md
examples/llava/clip.cpp
examples/llava/clip.h
examples/llava/convert_image_encoder_to_gguf.py
examples/llava/llava.cpp
examples/llava/llava_surgery_v2.py