gguf.md: add sharding to naming convention (#826)

author Brian <redacted>

Sun, 19 May 2024 08:05:26 +0000 (18:05 +1000)

committer GitHub <redacted>

Sun, 19 May 2024 08:05:26 +0000 (11:05 +0300)
author Brian <redacted>
Sun, 19 May 2024 08:05:26 +0000 (18:05 +1000)
committer GitHub <redacted>
Sun, 19 May 2024 08:05:26 +0000 (11:05 +0300)
diff --git a/docs/gguf.md b/docs/gguf.md

index f4e86d22e890f46feb41bc83b50e6a20751a66d3..d07ad276f5083eb00c35c2c6b1b287155d5d5428 100644 (file)
--- a/docs/gguf.md
+++ b/docs/gguf.md
@@ -20,13 +20,16 @@ The key difference between GGJT and GGUF is the use of a key-value structure for
  
  ### GGUF Naming Convention
  
-GGUF follow a naming convention of `<Model>-<Version>-<ExpertsCount>x<Parameters>-<EncodingScheme>.gguf`
+GGUF follow a naming convention of `<Model>(-<Version>)-(<ExpertsCount>x)<Parameters>-<EncodingScheme>(-<Shard>).gguf`
  
  The components are:
  1. **Model**: A descriptive name for the model type or architecture.
+    - This can be derived from gguf metadata `general.name` substituting spaces for dashes.
  2. **Version**: (Optional) Denotes the model version number, formatted as `v<Major>.<Minor>`
      - If model is missing a version number then assume `v0.0` (Prerelease)
-3. **ExpertsCount**: Indicates the number of experts found in a Mixture of Experts based model.
+    - This can be derived from gguf metadata `general.version`
+3. **ExpertsCount**: (Optional) Indicates the number of experts found in a Mixture of Experts based model.
+    - This can be derived from gguf metadata `llama.expert_count`
  4. **Parameters**: Indicates the number of parameters and their scale, represented as `<count><scale-prefix>`:
      - `Q`: Quadrillion parameters.
      - `T`: Trillion parameters.
@@ -34,6 +37,10 @@ The components are:
      - `M`: Million parameters.
      - `K`: Thousand parameters.
  5. **EncodingScheme**: Indicates the weights encoding scheme that was applied to the model. Content, type mixture and arrangement however are determined by user code and can vary depending on project needs.
+6. **Shard**: (Optional) Indicates and denotes that the model has been split into multiple shards, formatted as `<ShardNum>-of-<ShardTotal>`.
+    - *ShardNum* : Shard position in this model. Must be 5 digits padded by zeros.
+      - Shard number always starts from `00001` onwards (e.g. First shard always starts at `00001-of-XXXXX` rather than `00000-of-XXXXX`).
+    - *ShardTotal* : Total number of shards in this model. Must be 5 digits padded by zeros.
  
  #### Parsing Above Naming Convention
  
@@ -41,19 +48,63 @@ To correctly parse a well formed naming convention based gguf filename, it is re
  
  For example:
  
-  * `mixtral-v0.1-8x7B-KQ2.gguf`:
+  * `Mixtral-v0.1-8x7B-KQ2.gguf`:
      - Model Name: Mixtral
      - Version Number: v0.1
      - Expert Count: 8
      - Parameter Count: 7B
      - Weight Encoding Scheme: KQ2
+    - Shard: N/A
  
    * `Hermes-2-Pro-Llama-3-8B-F16.gguf`:
      - Model Name: Hermes 2 Pro Llama 3
-    - Version Number: v0.0 (`<Version>-` missing)
-    - Expert Count: 0 (`<ExpertsCount>x` missing)
+    - Version Number: v0.0
+    - Expert Count: 0
      - Parameter Count: 8B
      - Weight Encoding Scheme: F16
+    - Shard: N/A
+
+  * `Grok-v1.0-100B-Q4_0-00003-of-00009.gguf"`
+    - Model Name: Grok
+    - Version Number: v1.0
+    - Expert Count: 0
+    - Parameter Count: 100B
+    - Weight Encoding Scheme: Q4_0
+    - Shard: 3 out of 9 total shards
+
+You can also try using `/^(?<model_name>[A-Za-z0-9\s-]+)(?:-v(?<major>\d+)\.(?<minor>\d+))?-(?:(?<experts_count>\d+)x)?(?<parameters>\d+[A-Za-z]?)-(?<encoding_scheme>[\w_]+)(?:-(?<shard>\d{5})-of-(?<shardTotal>\d{5}))?\.gguf$/` regular expression to extract all the values above as well. Just don't forget to convert `-` to ` ` for the model name.
+
+<details><summary>Example Node.js Regex Function</summary>
+
+```js
+#!/usr/bin/env node
+const ggufRegex = /^(?<model_name>[A-Za-z0-9\s-]+)(?:-v(?<major>\d+)\.(?<minor>\d+))?-(?:(?<experts_count>\d+)x)?(?<parameters>\d+[A-Za-z]?)-(?<encoding_scheme>[\w_]+)(?:-(?<shard>\d{5})-of-(?<shardTotal>\d{5}))?\.gguf$/;
+
+function parseGGUFFilename(filename) {
+  const match = ggufRegex.exec(filename);
+  if (!match)
+    return null;
+  const {model_name, major = '0', minor = '0', experts_count = null, parameters, encoding_scheme, shard = null, shardTotal = null} = match.groups;
+  return {modelName: model_name.trim().replace(/-/g, ' '), version: `v${major}.${minor}`, expertsCount: experts_count ? +experts_count : null, parameters, encodingScheme: encoding_scheme, shard: shard ? +shard : null, shardTotal: shardTotal ? +shardTotal : null};
+}
+
+const testCases = [
+  {filename: 'Mixtral-v0.1-8x7B-KQ2.gguf',              expected: { modelName: 'Mixtral',              version: 'v0.1',   expertsCount: 8,    parameters: '7B',   encodingScheme: 'KQ2',  shard: null,    shardTotal: null }},
+  {filename: 'Grok-v1.0-100B-Q4_0-00003-of-00009.gguf', expected: { modelName: 'Grok',                 version: 'v1.0',   expertsCount: null, parameters: '100B', encodingScheme: 'Q4_0', shard: 3,       shardTotal: 9    }},
+  {filename: 'Hermes-2-Pro-Llama-3-8B-F16.gguf',        expected: { modelName: 'Hermes 2 Pro Llama 3', version: 'v0.0',   expertsCount: null, parameters: '8B',   encodingScheme: 'F16',  shard: null,    shardTotal: null }},
+  {filename: 'Hermes-2-Pro-Llama-3-v32.33-8Q-F16.gguf', expected: { modelName: 'Hermes 2 Pro Llama 3', version: 'v32.33', expertsCount: null, parameters: '8Q',   encodingScheme: 'F16',  shard: null,    shardTotal: null }},
+  {filename: 'not-a-known-arrangement.gguf', expected: null},
+];
+
+testCases.forEach(({ filename, expected }) => {
+  const result = parseGGUFFilename(filename);
+  const passed = JSON.stringify(result) === JSON.stringify(expected);
+  console.log(`${filename}: ${passed ? "PASS" : "FAIL"}`);
+});
+```
+
+</details>
+
  
  ### File Structure
author	Brian <redacted>
	Sun, 19 May 2024 08:05:26 +0000 (18:05 +1000)
committer	GitHub <redacted>
	Sun, 19 May 2024 08:05:26 +0000 (11:05 +0300)