readme : update models, cuda + ppl instructions (#3510)

author BarfingLemurs <redacted>

Fri, 6 Oct 2023 19:13:36 +0000 (15:13 -0400)

committer GitHub <redacted>

Fri, 6 Oct 2023 19:13:36 +0000 (22:13 +0300)
author BarfingLemurs <redacted>
Fri, 6 Oct 2023 19:13:36 +0000 (15:13 -0400)
committer GitHub <redacted>
Fri, 6 Oct 2023 19:13:36 +0000 (22:13 +0300)
diff --git a/README.md b/README.md

index e436818fa92c48b370d2c3f6f774bc463bcd73c7..0562795620e69d31d5fe0b562007eda06baf87b3 100644 (file)
--- a/README.md
+++ b/README.md
@@ -95,6 +95,7 @@ as the main playground for developing new features for the [ggml](https://github
  - [X] [Aquila-7B](https://huggingface.co/BAAI/Aquila-7B) / [AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
  - [X] [Starcoder models](https://github.com/ggerganov/llama.cpp/pull/3187)
  - [X] [Mistral AI v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- [X] [Refact](https://huggingface.co/smallcloudai/Refact-1_6B-fim)
  
  **Bindings:**
  
@@ -377,7 +378,7 @@ Building the program with BLAS support may lead to some performance improvements
  
  - #### cuBLAS
  
-  This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
+  This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. `apt install nvidia-cuda-toolkit`) or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
    - Using `make`:
      ```bash
      make LLAMA_CUBLAS=1
@@ -613,6 +614,18 @@ For more information, see [https://huggingface.co/docs/transformers/perplexity](
  The perplexity measurements in table above are done against the `wikitext2` test dataset (https://paperswithcode.com/dataset/wikitext-2), with context length of 512.
  The time per token is measured on a MacBook M1 Pro 32GB RAM using 4 and 8 threads.
  
+#### How to run
+
+1. Download/extract: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip?ref=salesforce-research
+2. Run `./perplexity -m models/7B/ggml-model-q4_0.gguf -f wiki.test.raw`
+3. Output:
+```
+perplexity : calculating perplexity over 655 chunks
+24.43 seconds per pass - ETA 4.45 hours
+[1]4.5970,[2]5.1807,[3]6.0382,...
+```
+And after 4.45 hours, you will have the final perplexity.
+
  ### Interactive mode
  
  If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.
@@ -775,18 +788,6 @@ If your issue is with model generation quality, then please at least scan the fo
      - [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
      - [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
  
-#### How to run
-
-1. Download/extract: https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip?ref=salesforce-research
-2. Run `./perplexity -m models/7B/ggml-model-q4_0.gguf -f wiki.test.raw`
-3. Output:
-```
-perplexity : calculating perplexity over 655 chunks
-24.43 seconds per pass - ETA 4.45 hours
-[1]4.5970,[2]5.1807,[3]6.0382,...
-```
-And after 4.45 hours, you will have the final perplexity.
-
  ### Android
  
  #### Building the Project using Android NDK
author	BarfingLemurs <redacted>
	Fri, 6 Oct 2023 19:13:36 +0000 (15:13 -0400)
committer	GitHub <redacted>
	Fri, 6 Oct 2023 19:13:36 +0000 (22:13 +0300)