readme : update GPU / CUDA

author Georgi Gerganov <redacted>

Sun, 12 Nov 2023 13:40:37 +0000 (15:40 +0200)

committer Georgi Gerganov <redacted>

Sun, 12 Nov 2023 13:40:37 +0000 (15:40 +0200)
author Georgi Gerganov <redacted>
Sun, 12 Nov 2023 13:40:37 +0000 (15:40 +0200)
committer Georgi Gerganov <redacted>
Sun, 12 Nov 2023 13:40:37 +0000 (15:40 +0200)
diff --git a/README.md b/README.md

index de620fd078b9d690df6da276fe94b0645df767b6..e4f9007d683669a48ae9444b6ead6782834d4021 100644 (file)
--- a/README.md
+++ b/README.md
@@ -16,12 +16,10 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
  - VSX intrinsics support for POWER architectures
  - Mixed F16 / F32 precision
  - [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization)
-- Low memory usage (Flash Attention)
  - Zero memory allocations at runtime
  - Support for CPU-only inference
-- [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
+- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
  - [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
-- [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
  - [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
  - [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
  
@@ -400,12 +398,12 @@ This can result in significant speedup in encoder performance. Here are the inst
  
    The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
    cached for the next run.
-  
+
  For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
  
-## NVIDIA GPU support via cuBLAS
+## NVIDIA GPU support
  
-With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.
+With NVIDIA cards the processing of the models is done efficiently on the GPU via cuBLAS and custom CUDA kernels.
  First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-downloads
  
  Now build `whisper.cpp` with cuBLAS support:
author	Georgi Gerganov <redacted>
	Sun, 12 Nov 2023 13:40:37 +0000 (15:40 +0200)
committer	Georgi Gerganov <redacted>
	Sun, 12 Nov 2023 13:40:37 +0000 (15:40 +0200)