- `[llm].tensor_data_layout: string`: When a model is converted to GGUF, tensors may be rearranged to improve performance. This key describes the layout of the tensor data. This is not required; if not present, it is assumed to be `reference`.
- `reference`: tensors are laid out in the same order as the original model
- further options can be found for each architecture in their respective sections
+- `[llm].expert_count: uint32`: Number of experts in MoE models (optional for non-MoE arches).
+- `[llm].expert_used_count: uint32`: Number of experts used during each token token evaluation (optional for non-MoE arches).
#### Attention
.swapaxes(1, 2)
.reshape(weights.shape))
```
+- `llama.expert_count`
+- `llama.expert_used_count`
##### MPT
- `ffn_up`: Feed-forward network "up" layer
- `ffn_gate`: Feed-forward network "gate" layer
- `ffn_down`: Feed-forward network "down" layer
+- `ffn_gate_inp`: Expert-routing layer for the Fee-forward network in MoE models
+- `ffn_gate_exp`: Feed-forward network "gate" layer per expert in MoE models
+- `ffn_down_exp`: Feed-forward network "down" layer per expert in MoE models
+- `ffn_up_exp`: Feed-forward network "up" layer per expert in MoE models
## Version History