Update README.md

author Georgi Gerganov <redacted>

Wed, 2 Nov 2022 16:33:29 +0000 (18:33 +0200)

committer GitHub <redacted>

Wed, 2 Nov 2022 16:33:29 +0000 (18:33 +0200)
author Georgi Gerganov <redacted>
Wed, 2 Nov 2022 16:33:29 +0000 (18:33 +0200)
committer GitHub <redacted>
Wed, 2 Nov 2022 16:33:29 +0000 (18:33 +0200)
diff --git a/README.md b/README.md

index 78bf3372759e4ab3820c6d91c8af5713e7f9b3f1..fdbc65e90ec1509842a72449426eb3a76bc7d814 100644 (file)
--- a/README.md
+++ b/README.md
@@ -34,6 +34,33 @@ As an example, here is a video of running the model on an iPhone 13 device - ful
  
  https://user-images.githubusercontent.com/1991296/197385372-962a6dea-bca1-4d50-bf96-1d8c27b98c81.mp4
  
+## Implementation details
+
+- The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c))
+- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp))
+- Sample usage is demonstrated in [main.cpp](examples/main)
+- Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
+- Various other examples are available in the [examples](examples) folder
+
+The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD
+instrisics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
+the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
+
+## Limitations
+
+- Inference only
+- No GPU support
+- Very basic greedy sampling scheme - always pick up the token with highest probability.
+  This should be similar to the [GreedyDecoder](https://github.com/openai/whisper/blob/main/whisper/decoding.py#L249-L274)
+  from the original python implementation, so in order to make a fair comparison between the 2 implementations, make sure
+  to run the python code with the following parameters:
+
+  ```
+  whisper --best_of None --beam_size None ...
+  ```
+
+  In the future, `whisper.cpp` will support more sampling strategies.
+
  ## Quick start
  
  First, download one of the Whisper models converted in [ggml format](models). For example:
@@ -319,33 +346,6 @@ https://user-images.githubusercontent.com/1991296/199337538-b7b0c7a3-2753-4a88-a
  
  ---
  
-## Implementation details
-
-- The core tensor operations are implemented in C ([ggml.h](ggml.h) / [ggml.c](ggml.c))
-- The transformer model and the high-level C-style API are implemented in C++ ([whisper.h](whisper.h) / [whisper.cpp](whisper.cpp))
-- Sample usage is demonstrated in [main.cpp](examples/main)
-- Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
-- Various other examples are available in the [examples](examples) folder
-
-The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD
-instrisics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
-the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
-
-## Limitations
-
-- Inference only
-- No GPU support
-- Very basic greedy sampling scheme - always pick up the token with highest probability.
-  This should be similar to the [GreedyDecoder](https://github.com/openai/whisper/blob/main/whisper/decoding.py#L249-L274)
-  from the original python implementation, so in order to make a fair comparison between the 2 implementations, make sure
-  to run the python code with the following parameters:
-
-  ```
-  whisper --best_of None --beam_size None ...
-  ```
-
-  In the future, `whisper.cpp` will support more sampling strategies.
-
  ## Benchmarks
  
  In order to have an objective comparison of the performance of the inference across different system configurations,
author	Georgi Gerganov <redacted>
	Wed, 2 Nov 2022 16:33:29 +0000 (18:33 +0200)
committer	GitHub <redacted>
	Wed, 2 Nov 2022 16:33:29 +0000 (18:33 +0200)