update main readme (#8333)

author Xuan Son Nguyen <redacted>

Sat, 6 Jul 2024 17:01:23 +0000 (19:01 +0200)

committer GitHub <redacted>

Sat, 6 Jul 2024 17:01:23 +0000 (19:01 +0200)
author Xuan Son Nguyen <redacted>
Sat, 6 Jul 2024 17:01:23 +0000 (19:01 +0200)
committer GitHub <redacted>
Sat, 6 Jul 2024 17:01:23 +0000 (19:01 +0200)
diff --git a/README.md b/README.md

index a4bbf00bdae9443e98a7850a31892bfd431c6acb..800b499e9d516141f5142aa6adf494afd52508ed 100644 (file)
--- a/README.md
+++ b/README.md
@@ -391,28 +391,21 @@ The `grammars/` folder contains a handful of sample grammars. To write your own,
  
  For authoring more complex JSON grammars, you can also check out https://grammar.intrinsiclabs.ai/, a browser app that lets you write TypeScript interfaces which it compiles to GBNF grammars that you can save for local use. Note that the app is built and maintained by members of the community, please file any issues or FRs on [its repo](http://github.com/intrinsiclabsai/gbnfgen) and not this one.
  
-### Obtaining and using the Facebook LLaMA 2 model
+## Build
  
-- Refer to [Facebook's LLaMA download page](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) if you want to access the model data.
-- Alternatively, if you want to save time and space, you can download already converted and quantized models from [TheBloke](https://huggingface.co/TheBloke), including:
-  - [LLaMA 2 7B base](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)
-  - [LLaMA 2 13B base](https://huggingface.co/TheBloke/Llama-2-13B-GGUF)
-  - [LLaMA 2 70B base](https://huggingface.co/TheBloke/Llama-2-70B-GGUF)
-  - [LLaMA 2 7B chat](https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF)
-  - [LLaMA 2 13B chat](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF)
-  - [LLaMA 2 70B chat](https://huggingface.co/TheBloke/Llama-2-70B-chat-GGUF)
+Please refer to [Build llama.cpp locally](./docs/build.md)
  
-### Seminal papers and background on the models
+## Supported backends
  
-If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
-- LLaMA:
-    - [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
-    - [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
-- GPT-3
-    - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
-- GPT-3.5 / InstructGPT / ChatGPT:
-    - [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
-    - [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
+| Backend | Target devices |
+| --- | --- |
+| [Metal](./docs/build.md#metal-build) | Apple Silicon |
+| [BLAS](./docs/build.md#blas-build) | All |
+| [BLIS](./docs/backend/BLIS.md) | All |
+| [SYCL](./docs/backend/SYCL.md) | Intel and Nvidia GPU |
+| [CUDA](./docs/build.md#cuda) | Nvidia GPU |
+| [hipBLAS](./docs/build.md#hipblas) | AMD GPU |
+| [Vulkan](./docs/build.md#vulkan) | GPU |
  
  ## Tools
  
@@ -460,3 +453,15 @@ To learn more how to measure perplexity using llama.cpp, [read this documentatio
  - [Build on Android](./docs/android.md)
  - [Performance troubleshooting](./docs/token_generation_performance_tips.md)
  - [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks)
+
+**Seminal papers and background on the models**
+
+If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
+- LLaMA:
+    - [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
+    - [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
+- GPT-3
+    - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
+- GPT-3.5 / InstructGPT / ChatGPT:
+    - [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
+    - [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
diff --git a/docs/build.md b/docs/build.md

index cf42a4eb93e1948f3ecedf7d10aa4b9a24f01791..bf41bfdf9c2f88096003a732aa179fec95c8f6f3 100644 (file)
--- a/docs/build.md
+++ b/docs/build.md
@@ -85,7 +85,7 @@ Building the program with BLAS support may lead to some performance improvements
  
  ### Accelerate Framework:
  
-  This is only available on Mac PCs and it's enabled by default. You can just build using the normal instructions.
+This is only available on Mac PCs and it's enabled by default. You can just build using the normal instructions.
  
  ### OpenBLAS:
author	Xuan Son Nguyen <redacted>
	Sat, 6 Jul 2024 17:01:23 +0000 (19:01 +0200)
committer	GitHub <redacted>
	Sat, 6 Jul 2024 17:01:23 +0000 (19:01 +0200)
README.md		patch \| blob \| history
docs/build.md		patch \| blob \| history