From: Georgi Gerganov <redacted>
Date: Thu, 4 May 2023 15:45:39 +0000 (+0300)
Subject: stablelm : update README.md
X-Git-Tag: upstream/0.0.1642~1494
X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=c5d97fc25e635665611239b5e8c5b35d67f944d1;p=pkg%2Fggml%2Fsources%2Fggml

stablelm : update README.md
---

diff --git a/examples/stablelm/README.md b/examples/stablelm/README.md
index fa708ae0..b375340d 100644
--- a/examples/stablelm/README.md
+++ b/examples/stablelm/README.md
@@ -4,43 +4,6 @@ Transformer architecture: GPT-NeoX
 
 Ref: https://github.com/stability-AI/stableLM/#stablelm-alpha
 
-## Warning
-
-**There seems to be a bug in the implementation.
-The embeddings magnitude increases after each layer which is unexpected.
-To observe this, uncomment the following line:**
-
-https://github.com/ggerganov/ggml/blob/abea4b7609c14b837015ab625e3ac36c4708dd03/src/ggml.c#L9208
-
-```
-...
-p[  0] =  65.5842
-p[  1] =  61.6951
-p[  2] =  59.3500
-p[  3] =  61.2421
-p[  4] =  65.9653
-p[  5] =  59.4936
-p[  6] =  58.4164
-p[  0] = -209.6351
-p[  1] = -214.0987
-p[  2] = -217.0928
-p[  3] = -215.0267
-p[  4] = -208.2430
-p[  5] = -215.3692
-p[  6] = -214.1981
-p[  0] = -301.0286
-p[  1] = -308.6521
-p[  2] = -310.7513
-p[  3] = -307.0832
-p[  4] = -299.9238
-p[  5] = -306.0667
-p[  6] = -302.1777
-...
-```
-
-**Instead, the magnitude should remain around `1`.
-Not sure where is the bug yet - need to compare results with the reference python implementation.**
-
 ## Usage
 
 ```bash
@@ -142,3 +105,40 @@ main:    total time =  4177.68 ms
 - The tokenizer is currently hacked - probably works only for English
 - Non-parallel residual is not supported
 - Contributions and improvements are welcome
+
+## Note about possible bug
+
+**There might be some issue with this implementation - not 100% sure.
+The embeddings magnitude increases after each layer which is unexpected.
+To observe this, uncomment the following line:**
+
+https://github.com/ggerganov/ggml/blob/abea4b7609c14b837015ab625e3ac36c4708dd03/src/ggml.c#L9208
+
+```
+...
+p[  0] =  65.5842
+p[  1] =  61.6951
+p[  2] =  59.3500
+p[  3] =  61.2421
+p[  4] =  65.9653
+p[  5] =  59.4936
+p[  6] =  58.4164
+p[  0] = -209.6351
+p[  1] = -214.0987
+p[  2] = -217.0928
+p[  3] = -215.0267
+p[  4] = -208.2430
+p[  5] = -215.3692
+p[  6] = -214.1981
+p[  0] = -301.0286
+p[  1] = -308.6521
+p[  2] = -310.7513
+p[  3] = -307.0832
+p[  4] = -299.9238
+p[  5] = -306.0667
+p[  6] = -302.1777
+...
+```
+
+**Instead, I think the magnitude should remain around `1`.
+See https://github.com/ggerganov/llama.cpp/issues/1063#issuecomment-1527730562 for more analysis**