readme : add instructions on converting to GGML + "--no-config" to wget (#874)

author Clifford Heath <redacted>

Mon, 8 May 2023 17:58:36 +0000 (03:58 +1000)

committer GitHub <redacted>

Mon, 8 May 2023 17:58:36 +0000 (20:58 +0300)
author Clifford Heath <redacted>
Mon, 8 May 2023 17:58:36 +0000 (03:58 +1000)
committer GitHub <redacted>
Mon, 8 May 2023 17:58:36 +0000 (20:58 +0300)
diff --git a/README.md b/README.md

index e74cc371f07a319fc33c79e85e0b8a08eac74d89..de942ecffe83fda38677fc8b832a2071cead177f 100644 (file)
--- a/README.md
+++ b/README.md
@@ -71,6 +71,8 @@ Then, download one of the Whisper models converted in [ggml format](models). For
  bash ./models/download-ggml-model.sh base.en
  ```
  
+If you wish to convert the Whisper models to ggml format yourself, instructions are in [models/README.md](models/README.md).
+
  Now build the [main](examples/main) example and transcribe an audio file like this:
  
  ```bash
diff --git a/models/README.md b/models/README.md

index ab0dde7ccc84e22148787a4f9b24ec7a376d8e59..c62f0361ea3b7ef752441c9eb08f3cafb29198f3 100644 (file)
--- a/models/README.md
+++ b/models/README.md
@@ -1,15 +1,17 @@
  ## Whisper model files in custom ggml format
  
  The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27)
-have been converted to custom `ggml` format in order to be able to load them in C/C++. The conversion has been performed
-using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script. You can either obtain the original models and generate
-the `ggml` files yourself using the conversion script, or you can use the [download-ggml-model.sh](download-ggml-model.sh)
-script to download the already converted models. Currently, they are hosted on the following locations:
+are converted to custom `ggml` format in order to be able to load them in C/C++.
+Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
+
+You can either obtain the original models and generate the `ggml` files yourself using the conversion script,
+or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models.
+Currently, they are hosted on the following locations:
  
  - https://huggingface.co/ggerganov/whisper.cpp
  - https://ggml.ggerganov.com
  
-Sample usage:
+Sample download:
  
  ```java
  $ ./download-ggml-model.sh base.en
@@ -21,6 +23,16 @@ You can now use it like this:
    $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
  ```
  
+To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage.
+The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper
+Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
+```
+mkdir models/whisper-medium
+python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
+mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
+rmdir models/whisper-medium
+```
+
  A third option to obtain the model files is to download them from Hugging Face:
  
  https://huggingface.co/ggerganov/whisper.cpp/tree/main
diff --git a/models/download-ggml-model.sh b/models/download-ggml-model.sh

index 749b409c4ffe983863b991f4ea3873ac3b36bd67..e5c59a73ebe15bfdab35e682fb511dd292bd9e85 100755 (executable)
--- a/models/download-ggml-model.sh
+++ b/models/download-ggml-model.sh
@@ -62,7 +62,7 @@ if [ -f "ggml-$model.bin" ]; then
  fi
  
  if [ -x "$(command -v wget)" ]; then
-    wget --quiet --show-progress -O ggml-$model.bin $src/$pfx-$model.bin
+    wget --no-config --quiet --show-progress -O ggml-$model.bin $src/$pfx-$model.bin
  elif [ -x "$(command -v curl)" ]; then
      curl -L --output ggml-$model.bin $src/$pfx-$model.bin
  else
author	Clifford Heath <redacted>
	Mon, 8 May 2023 17:58:36 +0000 (03:58 +1000)
committer	GitHub <redacted>
	Mon, 8 May 2023 17:58:36 +0000 (20:58 +0300)
README.md		patch \| blob \| history
models/README.md		patch \| blob \| history
models/download-ggml-model.sh		patch \| blob \| history