llava : add explicit instructions for llava-1.6 (#5611)

author Daniel Bevenius <redacted>

Tue, 20 Feb 2024 17:30:27 +0000 (18:30 +0100)

committer GitHub <redacted>

Tue, 20 Feb 2024 17:30:27 +0000 (19:30 +0200)
author Daniel Bevenius <redacted>
Tue, 20 Feb 2024 17:30:27 +0000 (18:30 +0100)
committer GitHub <redacted>
Tue, 20 Feb 2024 17:30:27 +0000 (19:30 +0200)
diff --git a/examples/llava/README.md b/examples/llava/README.md

index e42db6e5ad3cfdb59fab06233306fdd53c66a504..25ea967153510307f993e6b1e147266569b71ee2 100644 (file)
--- a/examples/llava/README.md
+++ b/examples/llava/README.md
@@ -59,14 +59,40 @@ python ./convert.py ../llava-v1.5-7b --skip-unknown
  Now both the LLaMA part and the image encoder is in the `llava-v1.5-7b` directory.
  
  ## LLaVA 1.6 gguf conversion
-
-1) Backup your pth/safetensor model files as llava-surgery modifies them
-2) Use `python llava-surgery-v2.py -C -m /path/to/hf-model` which also supports llava-1.5 variants pytorch as well as safetensor models:
+1) First clone a LLaVA 1.6 model:
+```console
+git clone https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b
+```
+2) Backup your pth/safetensor model files as llava-surgery modifies them
+3) Use `llava-surgery-v2.py` which also supports llava-1.5 variants pytorch as well as safetensor models:
+```console
+python examples/llava/llava-surgery-v2.py -C -m ../llava-v1.6-vicuna-7b/
+```
  - you will find a llava.projector and a llava.clip file in your model directory
-3) Copy the llava.clip file into a subdirectory (like vit), rename it to pytorch_model.bin and add a fitting vit configuration to the directory (https://huggingface.co/cmp-nct/llava-1.6-gguf/blob/main/config_vit.json) and rename it to config.json.
-4) Create the visual gguf model: `python ./examples/llava/convert-image-encoder-to-gguf.py -m ../path/to/vit --llava-projector ../path/to/llava.projector --output-dir ../path/to/output --clip-model-is-vision`
+4) Copy the llava.clip file into a subdirectory (like vit), rename it to pytorch_model.bin and add a fitting vit configuration to the directory:
+```console
+mkdir vit
+cp ../llava-v1.6-vicuna-7b/llava.clip vit/pytorch_model.bin
+cp ../llava-v1.6-vicuna-7b/llava.projector vit/
+curl -s -q https://huggingface.co/cmp-nct/llava-1.6-gguf/raw/main/config_vit.json -o vit/config.json
+```
+
+5) Create the visual gguf model:
+```console
+python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision
+```
  - This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP
-5) Everything else as usual: convert.py the hf model, quantize as needed
+
+6) Then convert the model to gguf format:
+```console
+python ./convert.py ../llava-v1.6-vicuna-7b/
+```
+
+7) And finally we can run the llava-cli using the 1.6 model version:
+```console
+./llava-cli -m ../llava-v1.6-vicuna-7b/ggml-model-f16.gguf --mmproj vit/mmproj-model-f16.gguf --image some-image.jpg -c 4096
+```
+
  **note** llava-1.6 needs more context than llava-1.5, at least 3000 is needed (just run it at -c 4096)
  **note** llava-1.6 greatly benefits from batched prompt processing (defaults work)
author	Daniel Bevenius <redacted>
	Tue, 20 Feb 2024 17:30:27 +0000 (18:30 +0100)
committer	GitHub <redacted>
	Tue, 20 Feb 2024 17:30:27 +0000 (19:30 +0200)