Update README.md

author Georgi Gerganov <redacted>

Sun, 18 Sep 2022 17:12:43 +0000 (20:12 +0300)

committer GitHub <redacted>

Sun, 18 Sep 2022 17:12:43 +0000 (20:12 +0300)
author Georgi Gerganov <redacted>
Sun, 18 Sep 2022 17:12:43 +0000 (20:12 +0300)
committer GitHub <redacted>
Sun, 18 Sep 2022 17:12:43 +0000 (20:12 +0300)
diff --git a/examples/gpt-j/README.md b/examples/gpt-j/README.md

index 608eac16662b0591997ba726f838e28940f41905..c5e0007cf0f4d86ae148252cbf84842a547b9e36 100644 (file)
--- a/examples/gpt-j/README.md
+++ b/examples/gpt-j/README.md
@@ -86,7 +86,7 @@ The most performance critical part of the implementation is of course the matrix
  
  On Arm64, I utilize the 128-bit NEON intrinsics for 16-bit floating point operations:
  
-https://github.com/ggerganov/ggml/blob/1548ac6743c594cc920ccb3503444b0e2bdf4d56/src/ggml.c#L187-L243
+https://github.com/ggerganov/ggml/blob/fb558f78d905f85c54813602649ddd628ffe0f3a/src/ggml.c#L187-L243
  
  These instructions allow each core to operate simultaneously on 64 floating point numbers. I'm no expert
  in SIMD, but after quite some trials this was the most efficient code for dot product that I could come up
@@ -98,7 +98,7 @@ One interesting property of the GPT-J transformer architecture is that it allows
  of the inference in parallel - i.e. the Feed-forward layer can be computed in parallel to the Self-Attention
  layer:
  
-https://github.com/ggerganov/ggml/blob/1548ac6743c594cc920ccb3503444b0e2bdf4d56/examples/gpt-j/main.cpp#L507-L531
+https://github.com/ggerganov/ggml/blob/fb558f78d905f85c54813602649ddd628ffe0f3a/examples/gpt-j/main.cpp#L507-L531
  
  So I thought why not bring in the M1 GPU to compute half of the neural network in parallel to the CPU.
  Thanks to the shared memory model, it was relatively easy to offload half of the computation to the GPU
author	Georgi Gerganov <redacted>
	Sun, 18 Sep 2022 17:12:43 +0000 (20:12 +0300)
committer	GitHub <redacted>
	Sun, 18 Sep 2022 17:12:43 +0000 (20:12 +0300)