From: Georgi Gerganov Date: Thu, 4 May 2023 15:45:39 +0000 (+0300) Subject: stablelm : update README.md X-Git-Tag: upstream/0.0.1642~1494 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=c5d97fc25e635665611239b5e8c5b35d67f944d1;p=pkg%2Fggml%2Fsources%2Fggml stablelm : update README.md --- diff --git a/examples/stablelm/README.md b/examples/stablelm/README.md index fa708ae0..b375340d 100644 --- a/examples/stablelm/README.md +++ b/examples/stablelm/README.md @@ -4,43 +4,6 @@ Transformer architecture: GPT-NeoX Ref: https://github.com/stability-AI/stableLM/#stablelm-alpha -## Warning - -**There seems to be a bug in the implementation. -The embeddings magnitude increases after each layer which is unexpected. -To observe this, uncomment the following line:** - -https://github.com/ggerganov/ggml/blob/abea4b7609c14b837015ab625e3ac36c4708dd03/src/ggml.c#L9208 - -``` -... -p[ 0] = 65.5842 -p[ 1] = 61.6951 -p[ 2] = 59.3500 -p[ 3] = 61.2421 -p[ 4] = 65.9653 -p[ 5] = 59.4936 -p[ 6] = 58.4164 -p[ 0] = -209.6351 -p[ 1] = -214.0987 -p[ 2] = -217.0928 -p[ 3] = -215.0267 -p[ 4] = -208.2430 -p[ 5] = -215.3692 -p[ 6] = -214.1981 -p[ 0] = -301.0286 -p[ 1] = -308.6521 -p[ 2] = -310.7513 -p[ 3] = -307.0832 -p[ 4] = -299.9238 -p[ 5] = -306.0667 -p[ 6] = -302.1777 -... -``` - -**Instead, the magnitude should remain around `1`. -Not sure where is the bug yet - need to compare results with the reference python implementation.** - ## Usage ```bash @@ -142,3 +105,40 @@ main: total time = 4177.68 ms - The tokenizer is currently hacked - probably works only for English - Non-parallel residual is not supported - Contributions and improvements are welcome + +## Note about possible bug + +**There might be some issue with this implementation - not 100% sure. +The embeddings magnitude increases after each layer which is unexpected. +To observe this, uncomment the following line:** + +https://github.com/ggerganov/ggml/blob/abea4b7609c14b837015ab625e3ac36c4708dd03/src/ggml.c#L9208 + +``` +... +p[ 0] = 65.5842 +p[ 1] = 61.6951 +p[ 2] = 59.3500 +p[ 3] = 61.2421 +p[ 4] = 65.9653 +p[ 5] = 59.4936 +p[ 6] = 58.4164 +p[ 0] = -209.6351 +p[ 1] = -214.0987 +p[ 2] = -217.0928 +p[ 3] = -215.0267 +p[ 4] = -208.2430 +p[ 5] = -215.3692 +p[ 6] = -214.1981 +p[ 0] = -301.0286 +p[ 1] = -308.6521 +p[ 2] = -310.7513 +p[ 3] = -307.0832 +p[ 4] = -299.9238 +p[ 5] = -306.0667 +p[ 6] = -302.1777 +... +``` + +**Instead, I think the magnitude should remain around `1`. +See https://github.com/ggerganov/llama.cpp/issues/1063#issuecomment-1527730562 for more analysis**