docs: fix links in development docs [no ci] (#8481)

author NikolaiLyssogor <redacted>

Mon, 15 Jul 2024 11:46:39 +0000 (04:46 -0700)

committer GitHub <redacted>

Mon, 15 Jul 2024 11:46:39 +0000 (14:46 +0300)
author NikolaiLyssogor <redacted>
Mon, 15 Jul 2024 11:46:39 +0000 (04:46 -0700)
committer GitHub <redacted>
Mon, 15 Jul 2024 11:46:39 +0000 (14:46 +0300)
diff --git a/docs/development/HOWTO-add-model.md b/docs/development/HOWTO-add-model.md

index 2712b66c175339e6560110454753c244c6639f83..04c5ccbbe60c3ded50ec722323ac59181366c8b7 100644 (file)
--- a/docs/development/HOWTO-add-model.md
+++ b/docs/development/HOWTO-add-model.md
@@ -9,15 +9,15 @@ Adding a model requires few steps:
  After following these steps, you can open PR.
  
  Also, it is important to check that the examples and main ggml backends (CUDA, METAL, CPU) are working with the new architecture, especially:
-- [main](../examples/main)
-- [imatrix](../examples/imatrix)
-- [quantize](../examples/quantize)
-- [server](../examples/server)
+- [main](/examples/main/)
+- [imatrix](/examples/imatrix/)
+- [quantize](/examples/quantize/)
+- [server](/examples/server/)
  
  ### 1. Convert the model to GGUF
  
  This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
-Depending on the model architecture, you can use either [convert_hf_to_gguf.py](../convert_hf_to_gguf.py) or [examples/convert_legacy_llama.py](../examples/convert_legacy_llama.py) (for `llama/llama2` models in `.pth` format).
+Depending on the model architecture, you can use either [convert_hf_to_gguf.py](/convert_hf_to_gguf.py) or [examples/convert_legacy_llama.py](/examples/convert_legacy_llama.py) (for `llama/llama2` models in `.pth` format).
  
  The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.
  
@@ -31,7 +31,7 @@ class MyModel(Model):
      model_arch = gguf.MODEL_ARCH.GROK
  ```
  
-2. Define the layout of the GGUF tensors in [constants.py](../gguf-py/gguf/constants.py)
+2. Define the layout of the GGUF tensors in [constants.py](/gguf-py/gguf/constants.py)
  
  Add an enum entry in `MODEL_ARCH`, the model human friendly name in `MODEL_ARCH_NAMES` and the GGUF tensor names in `MODEL_TENSORS`.
  
@@ -54,7 +54,7 @@ Example for `falcon` model:
  
  As a general rule, before adding a new tensor name to GGUF, be sure the equivalent naming does not already exist.
  
-Once you have found the GGUF tensor name equivalent, add it to the [tensor_mapping.py](../gguf-py/gguf/tensor_mapping.py) file.
+Once you have found the GGUF tensor name equivalent, add it to the [tensor_mapping.py](/gguf-py/gguf/tensor_mapping.py) file.
  
  If the tensor name is part of a repetitive layer/block, the key word `bid` substitutes it.
  
@@ -100,7 +100,7 @@ Have a look at existing implementation like `build_llama`, `build_dbrx` or `buil
  
  When implementing a new graph, please note that the underlying `ggml` backends might not support them all, support for missing backend operations can be added in another PR.
  
-Note: to debug the inference graph: you can use [llama-eval-callback](../examples/eval-callback).
+Note: to debug the inference graph: you can use [llama-eval-callback](/examples/eval-callback/).
  
  ## GGUF specification
  
diff --git a/docs/development/token_generation_performance_tips.md b/docs/development/token_generation_performance_tips.md

index c0840cad57fb3bc59851c047051f9e2620c6c874..41b7232c976b3a86fe7509c078bcedafd014a9e1 100644 (file)
--- a/docs/development/token_generation_performance_tips.md
+++ b/docs/development/token_generation_performance_tips.md
@@ -1,7 +1,7 @@
  # Token generation performance troubleshooting
  
  ## Verifying that the model is running on the GPU with CUDA
-Make sure you compiled llama with the correct env variables according to [this guide](../README.md#CUDA), so that llama accepts the `-ngl N` (or `--n-gpu-layers N`) flag. When running llama, you may configure `N` to be very large, and llama will offload the maximum possible number of layers to the GPU, even if it's less than the number you configured. For example:
+Make sure you compiled llama with the correct env variables according to [this guide](/docs/build.md#cuda), so that llama accepts the `-ngl N` (or `--n-gpu-layers N`) flag. When running llama, you may configure `N` to be very large, and llama will offload the maximum possible number of layers to the GPU, even if it's less than the number you configured. For example:
  ```shell
  ./llama-cli -m "path/to/model.gguf" -ngl 200000 -p "Please sir, may I have some "
  ```
author	NikolaiLyssogor <redacted>
	Mon, 15 Jul 2024 11:46:39 +0000 (04:46 -0700)
committer	GitHub <redacted>
	Mon, 15 Jul 2024 11:46:39 +0000 (14:46 +0300)
docs/development/HOWTO-add-model.md		patch \| blob \| history
docs/development/token_generation_performance_tips.md		patch \| blob \| history