Add quantize script for batch quantization (#92)

author Pavol Rusnak <redacted>

Mon, 13 Mar 2023 16:15:20 +0000 (17:15 +0100)

committer GitHub <redacted>

Mon, 13 Mar 2023 16:15:20 +0000 (18:15 +0200)
author Pavol Rusnak <redacted>
Mon, 13 Mar 2023 16:15:20 +0000 (17:15 +0100)
committer GitHub <redacted>
Mon, 13 Mar 2023 16:15:20 +0000 (18:15 +0200)
diff --git a/README.md b/README.md

index 3a6d757d636e0b0e40b391596122801d1cf36d85..65be1a687dbd6b26e5443fe2f37d55b22b8d30a9 100644 (file)
--- a/README.md
+++ b/README.md
@@ -145,44 +145,16 @@ python3 -m pip install torch numpy sentencepiece
  python3 convert-pth-to-ggml.py models/7B/ 1
  
  # quantize the model to 4-bits
-./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
+./quantize.sh 7B
  
  # run the inference
  ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
  ```
  
-For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format
-will create 2 ggml files, instead of one:
-
-```bash
-ggml-model-f16.bin
-ggml-model-f16.bin.1
-```
-
-You need to quantize each of them separately like this:
-
-```bash
-./quantize ./models/13B/ggml-model-f16.bin   ./models/13B/ggml-model-q4_0.bin 2
-./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
-```
-
-Everything else is the same. Simply run:
-
-```bash
-./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
-```
-
-The number of files generated for each model is as follows:
-
-```
-7B  -> 1 file
-13B -> 2 files
-30B -> 4 files
-65B -> 8 files
-```
-
  When running the larger models, make sure you have enough disk space to store all the intermediate files.
  
+TODO: add model disk/mem requirements
+
  ### Interactive mode
  
  If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.
diff --git a/quantize.sh b/quantize.sh

new file mode 100755 (executable)

index 0000000..6194649
--- /dev/null
+++ b/quantize.sh
@@ -0,0 +1,15 @@
+#!/usr/bin/env bash
+
+if ! [[ "$1" =~ ^[0-9]{1,2}B$ ]]; then
+    echo
+    echo "Usage: quantize.sh 7B|13B|30B|65B [--remove-f16]"
+    echo
+    exit 1
+fi
+
+for i in `ls models/$1/ggml-model-f16.bin*`; do
+    ./quantize "$i" "${i/f16/q4_0}" 2
+    if [[ "$2" == "--remove-f16" ]]; then
+        rm "$i"
+    fi
+done
author	Pavol Rusnak <redacted>
	Mon, 13 Mar 2023 16:15:20 +0000 (17:15 +0100)
committer	GitHub <redacted>
	Mon, 13 Mar 2023 16:15:20 +0000 (18:15 +0200)
README.md		patch \| blob \| history
quantize.sh	[new file with mode: 0755]	patch \| blob