gguf : document Mixtral changes in spec (#646)

author ariez-xyz <redacted>

Wed, 13 Dec 2023 12:01:31 +0000 (13:01 +0100)

committer GitHub <redacted>

Wed, 13 Dec 2023 12:01:31 +0000 (14:01 +0200)
author ariez-xyz <redacted>
Wed, 13 Dec 2023 12:01:31 +0000 (13:01 +0100)
committer GitHub <redacted>
Wed, 13 Dec 2023 12:01:31 +0000 (14:01 +0200)
diff --git a/docs/gguf.md b/docs/gguf.md

index 794a52923036493fb25e571e6e2d64e5ca8f3bba..1537170fc25009263d33987f04999a58e189ed29 100644 (file)
--- a/docs/gguf.md
+++ b/docs/gguf.md
@@ -285,6 +285,8 @@ In the following, `[llm]` is used to fill in for the name of a specific LLM arch
  - `[llm].tensor_data_layout: string`: When a model is converted to GGUF, tensors may be rearranged to improve performance. This key describes the layout of the tensor data. This is not required; if not present, it is assumed to be `reference`.
    - `reference`: tensors are laid out in the same order as the original model
    - further options can be found for each architecture in their respective sections
+- `[llm].expert_count: uint32`: Number of experts in MoE models (optional for non-MoE arches).
+- `[llm].expert_used_count: uint32`: Number of experts used during each token token evaluation (optional for non-MoE arches).
  
  #### Attention
  
@@ -341,6 +343,8 @@ The following sections describe the metadata for each model architecture. Each k
                      .swapaxes(1, 2)
                      .reshape(weights.shape))
      ```
+- `llama.expert_count`
+- `llama.expert_used_count`
  
  ##### MPT
  
@@ -553,6 +557,10 @@ where N signifies the block number a layer belongs to, and where `BB` could be:
  - `ffn_up`: Feed-forward network "up" layer
  - `ffn_gate`: Feed-forward network "gate" layer
  - `ffn_down`: Feed-forward network "down" layer
+- `ffn_gate_inp`: Expert-routing layer for the Fee-forward network in MoE models
+- `ffn_gate_exp`: Feed-forward network "gate" layer per expert in MoE models
+- `ffn_down_exp`: Feed-forward network "down" layer per expert in MoE models
+- `ffn_up_exp`: Feed-forward network "up" layer per expert in MoE models
  
  ## Version History
author	ariez-xyz <redacted>
	Wed, 13 Dec 2023 12:01:31 +0000 (13:01 +0100)
committer	GitHub <redacted>
	Wed, 13 Dec 2023 12:01:31 +0000 (14:01 +0200)