From: ariez-xyz Date: Wed, 13 Dec 2023 12:01:31 +0000 (+0100) Subject: gguf : document Mixtral changes in spec (#646) X-Git-Tag: upstream/0.0.1642~1176 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=a027a92c1db83ec3cd157cb4e36c6d88cf0615bc;p=pkg%2Fggml%2Fsources%2Fggml gguf : document Mixtral changes in spec (#646) * add new tensor names * add new keys * fix tensor names * gguf : change wording a bit --------- Co-authored-by: Georgi Gerganov --- diff --git a/docs/gguf.md b/docs/gguf.md index 794a5292..1537170f 100644 --- a/docs/gguf.md +++ b/docs/gguf.md @@ -285,6 +285,8 @@ In the following, `[llm]` is used to fill in for the name of a specific LLM arch - `[llm].tensor_data_layout: string`: When a model is converted to GGUF, tensors may be rearranged to improve performance. This key describes the layout of the tensor data. This is not required; if not present, it is assumed to be `reference`. - `reference`: tensors are laid out in the same order as the original model - further options can be found for each architecture in their respective sections +- `[llm].expert_count: uint32`: Number of experts in MoE models (optional for non-MoE arches). +- `[llm].expert_used_count: uint32`: Number of experts used during each token token evaluation (optional for non-MoE arches). #### Attention @@ -341,6 +343,8 @@ The following sections describe the metadata for each model architecture. Each k .swapaxes(1, 2) .reshape(weights.shape)) ``` +- `llama.expert_count` +- `llama.expert_used_count` ##### MPT @@ -553,6 +557,10 @@ where N signifies the block number a layer belongs to, and where `BB` could be: - `ffn_up`: Feed-forward network "up" layer - `ffn_gate`: Feed-forward network "gate" layer - `ffn_down`: Feed-forward network "down" layer +- `ffn_gate_inp`: Expert-routing layer for the Fee-forward network in MoE models +- `ffn_gate_exp`: Feed-forward network "gate" layer per expert in MoE models +- `ffn_down_exp`: Feed-forward network "down" layer per expert in MoE models +- `ffn_up_exp`: Feed-forward network "up" layer per expert in MoE models ## Version History