From: Georgi Gerganov Date: Thu, 24 Nov 2022 15:54:41 +0000 (+0200) Subject: models : add instructions for using HF fine-tuned models X-Git-Tag: upstream/1.7.4~1790 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=a2ecd54455c591c0d704e7162ed57a7396acea3b;p=pkg%2Fggml%2Fsources%2Fwhisper.cpp models : add instructions for using HF fine-tuned models --- diff --git a/models/README.md b/models/README.md index 26353018..7d6e451c 100644 --- a/models/README.md +++ b/models/README.md @@ -41,5 +41,24 @@ https://huggingface.co/datasets/ggerganov/whisper.cpp/tree/main ## Model files for testing purposes -The model files pefixed with `for-tests-` are empty (i.e. do not contain any weights) and are used by the CI for testing purposes. -They are directly included in this repository for convenience and the Github Actions CI uses them to run various sanitizer tests. +The model files prefixed with `for-tests-` are empty (i.e. do not contain any weights) and are used by the CI for +testing purposes. They are directly included in this repository for convenience and the Github Actions CI uses them to +run various sanitizer tests. + +## Fine-tuned models + +There are community efforts for creating fine-tuned Whisper models using extra training data. For example, this +[blog post](https://huggingface.co/blog/fine-tune-whisper) describes a method for fine-tuning using Hugging Face (HF) +Transformer implementation of Whisper. The produced models are in slightly different format compared to the original +OpenAI format. To read the HF models you can use the [convert-h5-to-ggml.py](convert-h5-to-ggml.py) script like this: + +``` +git clone https://github.com/openai/whisper +git clone https://github.com/ggerganov/whisper.cpp + +# clone HF fine-tuned model (this is just an example) +git clone https://huggingface.co/openai/whisper-base.en + +# convert the model to ggml +python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper . +```