- [X] Example of multiple LLMs inference [foldl/chatllm.cpp](https://github.com/foldl/chatllm.cpp)
- [X] SeamlessM4T inference *(in development)* https://github.com/facebookresearch/seamless_communication/tree/main/ggml
-## GPT inference (example)
-
-With ggml you can efficiently run [GPT-2](examples/gpt-2) and [GPT-J](examples/gpt-j) inference on the CPU.
-
-Here is how to run the example programs:
+## Python environment setup and building the examples
```bash
-# Build ggml + examples
git clone https://github.com/ggerganov/ggml
cd ggml
+# Install python dependencies in a virtual environment
+python3.10 -m venv ggml_env
+source ./ggml_env/bin/activate
+pip install -r requirements.txt
+# Build the examples
mkdir build && cd build
cmake ..
-make -j4 gpt-2-backend gpt-j
+cmake --build . --config Release -j 8
+```
+
+## GPT inference (example)
+
+With ggml you can efficiently run [GPT-2](examples/gpt-2) and [GPT-J](examples/gpt-j) inference on the CPU.
+Here is how to run the example programs:
+
+```bash
# Run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2-backend -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
../examples/gpt-j/download-ggml-model.sh 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"
-# Install Python dependencies
-python3 -m pip install -r ../requirements.txt
-
# Run the Cerebras-GPT 111M model
# Download from: https://huggingface.co/cerebras
python3 ../examples/gpt-2/convert-cerebras-to-ggml.py /path/to/Cerebras-GPT-111M/
Sample output:
-```
+```bash
$ ./bin/gpt-2 -h
usage: ./bin/gpt-2 [options]
Here is the entire process for the GPT-2 117M model (download from official site + conversion):
-```
+```bash
cd ggml/build
../examples/gpt-2/download-model.sh 117M
Use the [convert-cerebras-to-ggml.py](convert-cerebras-to-ggml.py) script to convert the model to `ggml` format:
-```
+```bash
cd ggml/build
git clone https://huggingface.co/cerebras/Cerebras-GPT-111M models/
python ../examples/gpt-2/convert-cerebras-to-ggml.py models/Cerebras-GPT-111M/
Here is how to get the 117M ggml model:
-```
+```bash
cd ggml/build
../examples/gpt-2/download-ggml-model.sh 117M
Keep in mind that for smaller models, this will render them completely useless.
You generally want to quantize larger models.
-```
+```bash
# quantize GPT-2 F16 to Q4_0 (faster but less precise)
./bin/gpt-2-quantize models/gpt-2-1558M/ggml-model-f16.bin models/gpt-2-1558M/ggml-model-q4_0.bin 2
./bin/gpt-2 -m models/gpt-2-1558M/ggml-model-q4_0.bin -p "This is an example"
Sample output:
-```
+```bash
$ gpt-2-batched -np 5 -m models/gpt-2-117M/ggml-model.bin -p "Hello my name is" -n 50
main: seed = 1697037431
Here is a sample run with prompt `int main(int argc, char ** argv) {`:
-```
+```bash
$ time ./bin/gpt-j -p "int main(int argc, char ** argv) {"
gptj_model_load: loading model from 'models/gpt-j-6B/ggml-model.bin' - please wait ...
Here is another run, just for fun:
-```
+```bash
time ./bin/gpt-j -n 500 -t 8 -p "Ask HN: Inherited the worst code and tech team I have ever seen. How to fix it?
"
If you want to give this a try and you are on Linux or Mac OS, simply follow these instructions:
```bash
-# Clone the ggml library and build the gpt-j example
-git clone https://github.com/ggerganov/ggml
-cd ggml
-mkdir build && cd build
-cmake ..
-make -j4 gpt-j
-
# Download the ggml-compatible GPT-J 6B model (requires 12GB disk space)
../examples/gpt-j/download-ggml-model.sh 6B
- Obtain the Magika model in H5 format
- Pinned version: https://github.com/google/magika/blob/4460acb5d3f86807c3b53223229dee2afa50c025/assets_generation/models/standard_v1/model.h5
- Use `convert.py` to convert the model to gguf format:
-```sh
+```bash
$ python examples/magika/convert.py /path/to/model.h5
```
- Invoke the program with the model file and a list of files to identify:
-```sh
+```bash
$ build/bin/magika model.h5.gguf examples/sam/example.jpg examples/magika/convert.py README.md src/ggml.c /bin/gcc write.exe jfk.wav
examples/sam/example.jpg : jpeg (100.00%) pptx (0.00%) smali (0.00%) shell (0.00%) sevenzip (0.00%)
examples/magika/convert.py : python (99.99%) javascript (0.00%) txt (0.00%) asm (0.00%) scala (0.00%)
These are simple examples of how to use GGML for inferencing.
The first example uses convolutional neural network (CNN), the second one uses fully connected neural network.
-## Python environment setup and build the examples
-
-```bash
-git clone https://github.com/ggerganov/ggml
-cd ggml
-# Install python dependencies in a virtual environment
-python3 -m venv ggml_env
-source ./ggml_env/bin/activate
-pip install -r requirements.txt
-# Build the examples
-mkdir build && cd build
-cmake ..
-make -j4 mnist-cnn mnist
-```
-
## MNIST with CNN
This implementation achieves ~99% accuracy on the MNIST test set.
### Training the model
+Setup the Python environemt and build the examples according to the main README.
Use the `mnist-cnn.py` script to train the model and convert it to GGUF format:
-```
+```bash
$ python3 ../examples/mnist/mnist-cnn.py train mnist-cnn-model
...
Keras model saved to 'mnist-cnn-model'
Convert the model to GGUF format:
-```
+```bash
$ python3 ../examples/mnist/mnist-cnn.py convert mnist-cnn-model
...
Model converted and saved to 'mnist-cnn-model.gguf'
from tensorflow import keras
from tensorflow.keras import layers
+
def train(model_name):
+ if not model_name.endswith(".keras") and not model_name.endswith(".h5"):
+ model_name += ".keras"
+
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)
model.save(model_name)
print("Keras model saved to '" + model_name + "'")
+
def convert(model_name):
+ if not model_name.endswith(".keras") and not model_name.endswith(".h5"):
+ model_name += ".keras"
+
model = keras.models.load_model(model_name)
- gguf_model_name = model_name + ".gguf"
+ if model_name.endswith(".keras"):
+ gguf_model_name = model_name[:-6] + ".gguf"
+ elif model_name.endswith(".h5"):
+ gguf_model_name = model_name[:-3] + ".gguf"
+ else:
+ gguf_model_name = model_name + ".gguf"
+
gguf_writer = gguf.GGUFWriter(gguf_model_name, "mnist-cnn")
kernel1 = model.layers[0].weights[0].numpy()
gguf_writer.close()
print("Model converted and saved to '{}'".format(gguf_model_name))
+
if __name__ == '__main__':
if len(sys.argv) < 3:
print("Usage: %s <train|convert> <model_name>".format(sys.argv[0]))
- [ ] GPU support
## Quick start
-```bash
-git clone https://github.com/ggerganov/ggml
-cd ggml
-
-# Install Python dependencies
-python3 -m pip install -r requirements.txt
+Setup Python and build examples according to main README.
+```bash
# Download PTH model
wget -P examples/sam/ https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
# Convert PTH model to ggml
python examples/sam/convert-pth-to-ggml.py examples/sam/sam_vit_b_01ec64.pth examples/sam/ 1
-# Build ggml + examples
-mkdir build && cd build
-cmake .. && make -j4
-
# run inference
./bin/sam -t 16 -i ../examples/sam/example.jpg -m ../examples/sam/ggml-model-f16.bin
```
transformers>=4.35.2,<5.0.0
gguf>=0.1.0
keras==2.15.0
+tensorflow==2.15.0
--extra-index-url https://download.pytorch.org/whl/cpu
torch~=2.2.1