gguf : add keys for kv sizes to spec (#676)

author postmasters <redacted>

Fri, 5 Jan 2024 15:25:38 +0000 (07:25 -0800)

committer GitHub <redacted>

Fri, 5 Jan 2024 15:25:38 +0000 (17:25 +0200)
author postmasters <redacted>
Fri, 5 Jan 2024 15:25:38 +0000 (07:25 -0800)
committer GitHub <redacted>
Fri, 5 Jan 2024 15:25:38 +0000 (17:25 +0200)
diff --git a/docs/gguf.md b/docs/gguf.md

index 1537170fc25009263d33987f04999a58e189ed29..bb63f4f0e6e24671c8db9a47356e4c7bf5434f88 100644 (file)
--- a/docs/gguf.md
+++ b/docs/gguf.md
@@ -296,6 +296,8 @@ In the following, `[llm]` is used to fill in for the name of a specific LLM arch
  - `[llm].attention.clamp_kqv: float32`: Value (`C`) to clamp the values of the `Q`, `K`, and `V` tensors between (`[-C, C]`).
  - `[llm].attention.layer_norm_epsilon: float32`: Layer normalization epsilon.
  - `[llm].attention.layer_norm_rms_epsilon: float32`: Layer RMS normalization epsilon.
+- `[llm].attention.key_length: uint32`: The optional size of a key head, $d_k$. If not specified, it will be `n_embd / n_head`.
+- `[llm].attention.value_length: uint32`: The optional size of a value head, $d_v$. If not specified, it will be `n_embd / n_head`.
  
  #### RoPE
author	postmasters <redacted>
	Fri, 5 Jan 2024 15:25:38 +0000 (07:25 -0800)
committer	GitHub <redacted>
	Fri, 5 Jan 2024 15:25:38 +0000 (17:25 +0200)