This was hacked in an evening - I have no idea if it works correctly.
So far, I've tested just the 7B model and the generated text starts coherently, but typically degrades significanlty after ~30-40 tokens.
-Here is a "typicaly" run:
+Here is a "typical" run:
```java
make -j && ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
If you are a fan of the original Star Wars trilogy, then you'll want to see this.
If you don't know your Star Wars lore, this will be a huge eye-opening and you will be a little confusing.
-Awesome movie.(end of text)
+Awesome movie. [end of text]
main: mem per token = 14434244 bytes