]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log
pkg/ggml/sources/whisper.cpp
2 years agoTry to improve the token sampling strategy (#193)
Georgi Gerganov [Fri, 2 Dec 2022 19:51:50 +0000 (21:51 +0200)]
Try to improve the token sampling strategy (#193)

* whisper : try to improve the token sampling strategy

- Add the "max_initial_timestaamp" token logic from OpenAI
- Disallow sampling timestamps that are in the past

* whisper : fix the max initial timestamp logic + fallback decoding

2 years agotests : adding transcription tests
Georgi Gerganov [Mon, 28 Nov 2022 20:44:01 +0000 (22:44 +0200)]
tests : adding transcription tests

2 years agoggml : remove inline specifier from fp16 <-> fp32 converters
Georgi Gerganov [Thu, 1 Dec 2022 20:15:12 +0000 (22:15 +0200)]
ggml : remove inline specifier from fp16 <-> fp32 converters

2 years agolivestream : handle ffmpeg errors gracefully and stabilize transcript
Georgi Gerganov [Thu, 1 Dec 2022 18:49:09 +0000 (20:49 +0200)]
livestream : handle ffmpeg errors gracefully and stabilize transcript

2 years agolivestream : minor changes
Georgi Gerganov [Thu, 1 Dec 2022 17:47:58 +0000 (19:47 +0200)]
livestream : minor changes

2 years agolivestream : fix losing words across audio chunk (#195)
semiformal-net [Thu, 1 Dec 2022 17:18:22 +0000 (12:18 -0500)]
livestream : fix losing words across audio chunk (#195)

* improve livestream script

* Update examples/livestream.sh

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Paul Edwards <redacted>
Co-authored-by: Georgi Gerganov <redacted>
2 years agoFix Darwin flags - was incorrectly always using the Linux else clause
Tienshiao Ma [Tue, 29 Nov 2022 07:29:34 +0000 (23:29 -0800)]
Fix Darwin flags - was incorrectly always using the Linux else clause

2 years agowhisper : add mechanism for aborting the whisper_full() computation
Georgi Gerganov [Sun, 27 Nov 2022 18:28:36 +0000 (20:28 +0200)]
whisper : add mechanism for aborting the whisper_full() computation

2 years agoUpdate README.md
Georgi Gerganov [Sun, 27 Nov 2022 09:30:32 +0000 (11:30 +0200)]
Update README.md

2 years agowhisper.objc : fix context + broken readme links
Georgi Gerganov [Sun, 27 Nov 2022 08:48:59 +0000 (10:48 +0200)]
whisper.objc : fix context + broken readme links

2 years agowhisper.objc : add real-time processing (#97)
Georgi Gerganov [Sat, 26 Nov 2022 15:28:28 +0000 (17:28 +0200)]
whisper.objc : add real-time processing (#97)

Similar to the "stream" app

2 years agowhisper.objc : fix build warnings
Georgi Gerganov [Sat, 26 Nov 2022 14:27:04 +0000 (16:27 +0200)]
whisper.objc : fix build warnings

2 years agominor : remove "examples/" prefix from the README
Georgi Gerganov [Sat, 26 Nov 2022 11:07:54 +0000 (13:07 +0200)]
minor : remove "examples/" prefix from the README

2 years agoyt-wsp.sh : script to easily transcribe VODs
Georgi Gerganov [Sat, 26 Nov 2022 10:53:23 +0000 (12:53 +0200)]
yt-wsp.sh : script to easily transcribe VODs

Thanks to @DaniruKun
ref: https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818

Usage:

  cd whisper.cpp
  make

  ./examples/yt-wsp.sh <video-url>

2 years agoUpdate README.md
Georgi Gerganov [Sat, 26 Nov 2022 09:56:55 +0000 (11:56 +0200)]
Update README.md

2 years agocommand.wasm : add voice assistant example for the Web (#171)
Georgi Gerganov [Sat, 26 Nov 2022 09:40:06 +0000 (11:40 +0200)]
command.wasm : add voice assistant example for the Web (#171)

Same as the command-line tool "command", but runs in the browser

Also, added helper script "extra/deploy-wasm.sh" and fixed some timing
constants for the WASM examples.

2 years agominor : add comment for using "generate_karaoke.sh"
Georgi Gerganov [Sat, 26 Nov 2022 08:22:42 +0000 (10:22 +0200)]
minor : add comment for using "generate_karaoke.sh"

2 years agolivestream.sh : simple tool to transcribe audio livestreams (#185)
Georgi Gerganov [Sat, 26 Nov 2022 08:05:37 +0000 (10:05 +0200)]
livestream.sh : simple tool to transcribe audio livestreams (#185)

2 years agostream.wasm : add web-based real-time transcription (#112)
Georgi Gerganov [Fri, 25 Nov 2022 21:57:46 +0000 (23:57 +0200)]
stream.wasm : add web-based real-time transcription (#112)

2 years agowhisper.wasm : do not block page while processing (close #86)
Georgi Gerganov [Fri, 25 Nov 2022 21:07:42 +0000 (23:07 +0200)]
whisper.wasm : do not block page while processing (close #86)

2 years agomain : add stereo-channel-based diarization (#64)
Georgi Gerganov [Fri, 25 Nov 2022 20:08:58 +0000 (22:08 +0200)]
main : add stereo-channel-based diarization (#64)

Not tested - I don't have stereo dialog audio

2 years agocommand : add demonstration video
Georgi Gerganov [Fri, 25 Nov 2022 18:23:58 +0000 (20:23 +0200)]
command : add demonstration video

2 years agocommand : fix build + fix README + add bold printing
Georgi Gerganov [Fri, 25 Nov 2022 17:53:50 +0000 (19:53 +0200)]
command : fix build + fix README + add bold printing

2 years agoexamples : add "command" tool (#171)
Georgi Gerganov [Fri, 25 Nov 2022 17:06:56 +0000 (19:06 +0200)]
examples : add "command" tool (#171)

2 years agorefactoring : more readable code
Georgi Gerganov [Fri, 25 Nov 2022 17:08:51 +0000 (19:08 +0200)]
refactoring : more readable code

2 years agocorrect model name display on running samples
vicalloy [Fri, 25 Nov 2022 03:24:08 +0000 (11:24 +0800)]
correct model name display on running samples

2 years agowasm : refactor wasm example + reuse fetch mechanism
Georgi Gerganov [Thu, 24 Nov 2022 21:13:26 +0000 (23:13 +0200)]
wasm : refactor wasm example + reuse fetch mechanism

2 years agotalk.wasm : update video link + some minor fixes
Georgi Gerganov [Thu, 24 Nov 2022 18:15:07 +0000 (20:15 +0200)]
talk.wasm : update video link + some minor fixes

2 years agoUpdate README.md
Georgi Gerganov [Thu, 24 Nov 2022 18:09:45 +0000 (20:09 +0200)]
Update README.md

Use a less cringy video to demo talk.wasm lol

2 years agoUpdate README.md
Georgi Gerganov [Thu, 24 Nov 2022 18:06:51 +0000 (20:06 +0200)]
Update README.md

2 years agotalk.wasm : move to https://whisper.ggerganov.com/talk
Georgi Gerganov [Thu, 24 Nov 2022 16:24:06 +0000 (18:24 +0200)]
talk.wasm : move to https://whisper.ggerganov.com/talk

This way, we can share the same models across different WASM examples
and not have to download them for each page

2 years agomodels : add instructions for using HF fine-tuned models
Georgi Gerganov [Thu, 24 Nov 2022 15:54:41 +0000 (17:54 +0200)]
models : add instructions for using HF fine-tuned models

2 years agowhisper : improve printfs
Georgi Gerganov [Thu, 24 Nov 2022 15:54:16 +0000 (17:54 +0200)]
whisper : improve printfs

2 years agomain : fix dangling pointer when using stdin for input (#65)
Georgi Gerganov [Thu, 24 Nov 2022 15:53:51 +0000 (17:53 +0200)]
main : fix dangling pointer when using stdin for input (#65)

2 years agomain, stream : remove --verbose flag (#178)
Georgi Gerganov [Thu, 24 Nov 2022 15:52:04 +0000 (17:52 +0200)]
main, stream : remove --verbose flag (#178)

2 years agotalk.wasm : add audio pre-processing + bump memory
Georgi Gerganov [Wed, 23 Nov 2022 22:34:00 +0000 (00:34 +0200)]
talk.wasm : add audio pre-processing + bump memory

2 years agotalk.wasm : refactoring + update README.md
Georgi Gerganov [Wed, 23 Nov 2022 22:08:57 +0000 (00:08 +0200)]
talk.wasm : refactoring + update README.md

2 years agomodels : add usage comments to the HF convert script (#157)
Georgi Gerganov [Wed, 23 Nov 2022 21:22:40 +0000 (23:22 +0200)]
models : add usage comments to the HF convert script (#157)

2 years agomodels : fix HF fine-tuned model conversion script (#157)
Georgi Gerganov [Wed, 23 Nov 2022 21:14:11 +0000 (23:14 +0200)]
models : fix HF fine-tuned model conversion script (#157)

It works now

2 years agoggml : fix the fix
Georgi Gerganov [Wed, 23 Nov 2022 20:40:06 +0000 (22:40 +0200)]
ggml : fix the fix

2 years agoggml : fix cross-compile Linux -> Window with mingw (#168)
Georgi Gerganov [Wed, 23 Nov 2022 20:27:49 +0000 (22:27 +0200)]
ggml : fix cross-compile Linux -> Window with mingw (#168)

2 years agoRevert "update README.md"
Georgi Gerganov [Wed, 23 Nov 2022 20:16:50 +0000 (22:16 +0200)]
Revert "update README.md"

This reverts commit 6a84147113669bed68bbc4d31e3c14f914092bf8.

2 years agoupdate README.md
katsu560 [Wed, 23 Nov 2022 13:59:54 +0000 (22:59 +0900)]
update README.md

2 years agoggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16
katsu560 [Wed, 23 Nov 2022 13:54:21 +0000 (22:54 +0900)]
ggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16

2 years agoadd gprof option
katsu560 [Wed, 23 Nov 2022 12:31:05 +0000 (21:31 +0900)]
add gprof option

2 years agofix AVX,AVX2,FMA,F16C detection on Linux and add flags for OpenBLAS
katsu560 [Wed, 23 Nov 2022 11:23:35 +0000 (20:23 +0900)]
fix AVX,AVX2,FMA,F16C detection on Linux and add flags for OpenBLAS

2 years agoadd AVX support
katsu560 [Wed, 23 Nov 2022 11:23:24 +0000 (20:23 +0900)]
add AVX support

2 years agoBuild with OpenBLAS and SDL2 on windows
Tamotsu Takahashi [Wed, 23 Nov 2022 06:17:13 +0000 (15:17 +0900)]
Build with OpenBLAS and SDL2 on windows

2 years agomodels : minor changes to the HF convert script (#157)
Georgi Gerganov [Wed, 23 Nov 2022 20:07:20 +0000 (22:07 +0200)]
models : minor changes to the HF convert script (#157)

2 years agomodels : add "convert-h5-to-ggml.py" script (#157)
Georgi Gerganov [Wed, 23 Nov 2022 15:17:31 +0000 (17:17 +0200)]
models : add "convert-h5-to-ggml.py" script (#157)

Converts transformers models to ggml.
Although the conversion is successful, it does not work for some reason.
Not sure why

2 years agominor : updates few prints + fix buttons in whisper.wasm
Georgi Gerganov [Wed, 23 Nov 2022 15:17:01 +0000 (17:17 +0200)]
minor : updates few prints + fix buttons in whisper.wasm

2 years agoUpdate README.md
Georgi Gerganov [Wed, 23 Nov 2022 07:53:55 +0000 (09:53 +0200)]
Update README.md

2 years agoUpdate README.md
Georgi Gerganov [Wed, 23 Nov 2022 07:52:36 +0000 (09:52 +0200)]
Update README.md

2 years agoFind libopenblas.dll.a on windows
Tamotsu Takahashi [Wed, 23 Nov 2022 00:46:56 +0000 (09:46 +0900)]
Find libopenblas.dll.a on windows

"lib" is needed for windows.

With this change, you can build whisper.cpp with OpenBLAS's prebuilt DLL.
1. extract a zip from https://github.com/xianyi/OpenBLAS/releases
2. copy the headers in (openblas)/include to the root directory of whisper.cpp
3. invoke cmake with -DCMAKE_LIBRARY_PATH=(openblas)\lib -DWHISPER_SUPPORT_OPENBLAS=ON
4. copy (openblas)/bin/libopenblas.dll to the same directory of whisper.dll after msbuild

https://github.com/ggerganov/whisper.cpp/issues/89#issuecomment-1324391258

2 years agounicode : fix character replacement (thanks to @tamo)
Georgi Gerganov [Wed, 23 Nov 2022 06:24:29 +0000 (08:24 +0200)]
unicode : fix character replacement (thanks to @tamo)

2 years agoclose #109 : add fetching of the model over HTTP (whisper.wasm)
Georgi Gerganov [Tue, 22 Nov 2022 20:48:56 +0000 (22:48 +0200)]
close #109 : add fetching of the model over HTTP (whisper.wasm)

2 years agotalk.wasm : final touches
Georgi Gerganov [Tue, 22 Nov 2022 20:22:17 +0000 (22:22 +0200)]
talk.wasm : final touches

2 years agotalk.wasm : polishing + adding many AI personalities
Georgi Gerganov [Tue, 22 Nov 2022 18:10:20 +0000 (20:10 +0200)]
talk.wasm : polishing + adding many AI personalities

2 years agostream : "-kc" now enables context keeping from previous segment (#90)
Georgi Gerganov [Tue, 22 Nov 2022 16:20:05 +0000 (18:20 +0200)]
stream : "-kc" now enables context keeping from previous segment (#90)

By default, the context keeping is disabled

2 years agoPrompt previous tokens for streaming (#163)
M. Eren Akbiyik [Tue, 22 Nov 2022 16:10:35 +0000 (17:10 +0100)]
Prompt previous tokens for streaming (#163)

* feat: prompt previous tokens for streaming

I used a vector pointer instead of vector itself because it gave weird errors, and why not

* convert vector to use with C api

* feat: remove old refs, check for prompt size

* feat: use better way of getting the pointer

2 years agotalk.wasm : update README.md
Georgi Gerganov [Mon, 21 Nov 2022 20:42:29 +0000 (22:42 +0200)]
talk.wasm : update README.md

2 years agotalk.wasm : GPT-2 meets Whisper in WebAssembly (#155)
Georgi Gerganov [Mon, 21 Nov 2022 20:20:42 +0000 (22:20 +0200)]
talk.wasm : GPT-2 meets Whisper in WebAssembly (#155)

* talk : initial real-time transcription in the browser

* talk : polishing the UI

* talk : ready for beta testing

* talk.wasm : rename example

2 years agoUpdate README.md
Georgi Gerganov [Mon, 21 Nov 2022 16:52:20 +0000 (18:52 +0200)]
Update README.md

2 years agoggml : fix Windows build
Georgi Gerganov [Sun, 20 Nov 2022 20:43:32 +0000 (22:43 +0200)]
ggml : fix Windows build

2 years agoci : add Windows build
Georgi Gerganov [Sun, 20 Nov 2022 20:39:39 +0000 (22:39 +0200)]
ci : add Windows build

2 years agostream : add "max_tokens" cli arg
Georgi Gerganov [Sun, 20 Nov 2022 19:22:02 +0000 (21:22 +0200)]
stream : add "max_tokens" cli arg

Controls the max tokens per segment for the stream example

2 years agostream : add "audio_ctx" parameter
Georgi Gerganov [Sun, 20 Nov 2022 19:12:01 +0000 (21:12 +0200)]
stream : add "audio_ctx" parameter

Used to overwrite the audio context size of the Encoder.
For example, setting "audio_ctx = 512" will make it run about 3 times
faster, processing about 10s of audio, instead of 30s.

The transcription quality drops, but this can be used for real-time
streaming purposes where performance is important.

2 years agostream : add "max_tokens" parameter
Georgi Gerganov [Sun, 20 Nov 2022 18:52:24 +0000 (20:52 +0200)]
stream : add "max_tokens" parameter

Used to limit the number of tokens in a segment.
Useful to battle with word repetition when using partial encoder context

2 years agostream : add "single_segment" option
Georgi Gerganov [Sun, 20 Nov 2022 18:45:10 +0000 (20:45 +0200)]
stream : add "single_segment" option

Force the entire audio chunk to be transcribed into a single segment

2 years agostream : partial encoder experiments
Georgi Gerganov [Fri, 11 Nov 2022 20:33:10 +0000 (22:33 +0200)]
stream : partial encoder experiments

2 years agofix: free ggml_context (close #149) (#150)
greeshmay [Thu, 17 Nov 2022 20:12:51 +0000 (12:12 -0800)]
fix: free ggml_context (close #149) (#150)

* fix: free ggml_context

* ggml : free the model's contexts in whisper_free()

Co-authored-by: Georgi Gerganov <redacted>
2 years agomodels : simplify the conversion script
Georgi Gerganov [Wed, 16 Nov 2022 17:21:43 +0000 (19:21 +0200)]
models : simplify the conversion script

"transformers" dependency is not actually needed

2 years agoUpdate download-ggml-model.sh
Dody Suria Wijaya [Wed, 16 Nov 2022 16:53:01 +0000 (23:53 +0700)]
Update download-ggml-model.sh

follow curl redirect to new hosting site

2 years agomodels : change default hosting to Hugging Face
Georgi Gerganov [Tue, 15 Nov 2022 17:47:06 +0000 (19:47 +0200)]
models : change default hosting to Hugging Face

My Linode is running out of monthly bandwidth due to the big interest in
the project

2 years agowhisper : add option to speed up the audio tempo by x2
Georgi Gerganov [Sat, 12 Nov 2022 16:03:49 +0000 (18:03 +0200)]
whisper : add option to speed up the audio tempo by x2

Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.

2 years agomake : add libwhisper.so target (#144)
Georgi Gerganov [Sun, 13 Nov 2022 07:08:33 +0000 (09:08 +0200)]
make : add libwhisper.so target (#144)

2 years agoAdd WHISPER_NO_AVX and WHISPER_NO_AVX2 to CMakeLists (#136)
Chidi Williams [Fri, 11 Nov 2022 16:10:01 +0000 (16:10 +0000)]
Add WHISPER_NO_AVX and WHISPER_NO_AVX2 to CMakeLists (#136)

* Check for AVX and AVX2 on Darwin

* Add AVX options to CMakeLists

2 years agominor : remove one more redundant line
Georgi Gerganov [Fri, 11 Nov 2022 16:02:58 +0000 (18:02 +0200)]
minor : remove one more redundant line

2 years agominor : fix double float32 conversion in python script
Georgi Gerganov [Fri, 11 Nov 2022 15:58:51 +0000 (17:58 +0200)]
minor : fix double float32 conversion in python script

2 years agoref #40 : start working on the documentation
Georgi Gerganov [Wed, 9 Nov 2022 19:41:21 +0000 (21:41 +0200)]
ref #40 : start working on the documentation

2 years agoAdds support for stdin wav input
Alan [Wed, 9 Nov 2022 18:24:06 +0000 (15:24 -0300)]
Adds support for stdin wav input

2 years agojs : update whipser.js to latest
Georgi Gerganov [Wed, 9 Nov 2022 17:32:58 +0000 (19:32 +0200)]
js : update whipser.js to latest

2 years agoCheck for AVX and AVX2 on Darwin
Chidi Williams [Wed, 9 Nov 2022 00:28:36 +0000 (00:28 +0000)]
Check for AVX and AVX2 on Darwin

2 years agoFix the Windows pthread_create shim
boolemancer [Tue, 8 Nov 2022 11:04:23 +0000 (03:04 -0800)]
Fix the Windows pthread_create shim

The current implementation doesn't actually set the out parameter,
and it returns 0 on failure instead of on success.

2 years agosync : submodule whisper.spm
Georgi Gerganov [Mon, 7 Nov 2022 19:48:13 +0000 (21:48 +0200)]
sync : submodule whisper.spm

2 years agocmake : add submodule whisper.spm
Georgi Gerganov [Mon, 7 Nov 2022 18:50:24 +0000 (20:50 +0200)]
cmake : add submodule whisper.spm

2 years agoref #22 : add "duration" option
Georgi Gerganov [Mon, 7 Nov 2022 18:14:52 +0000 (20:14 +0200)]
ref #22 : add "duration" option

Can be used to partially process a recording

2 years agoUpdate README.md
Georgi Gerganov [Sun, 6 Nov 2022 19:04:21 +0000 (21:04 +0200)]
Update README.md

2 years agoexamples : add simple script for generating Karaoke video
Georgi Gerganov [Sun, 6 Nov 2022 07:22:50 +0000 (09:22 +0200)]
examples : add simple script for generating Karaoke video

2 years agoUpdate README.md
Georgi Gerganov [Sat, 5 Nov 2022 06:44:41 +0000 (08:44 +0200)]
Update README.md

2 years agoUpdate README.md
Georgi Gerganov [Fri, 4 Nov 2022 20:26:08 +0000 (22:26 +0200)]
Update README.md

2 years agomain : fix generated bash script
Georgi Gerganov [Fri, 4 Nov 2022 16:30:38 +0000 (18:30 +0200)]
main : fix generated bash script

2 years agoggml : multi-thread the ggml_add operator
Georgi Gerganov [Thu, 3 Nov 2022 18:53:44 +0000 (20:53 +0200)]
ggml : multi-thread the ggml_add operator

2 years agocmake : fix passing GGML_PERF compile option
Georgi Gerganov [Thu, 3 Nov 2022 18:18:57 +0000 (20:18 +0200)]
cmake : fix passing GGML_PERF compile option

2 years agoUpdate README.md
Georgi Gerganov [Wed, 2 Nov 2022 20:03:27 +0000 (22:03 +0200)]
Update README.md

2 years agowhisper : token-level timestamp refactoring (#49, #120)
Georgi Gerganov [Wed, 2 Nov 2022 19:18:20 +0000 (21:18 +0200)]
whisper : token-level timestamp refactoring (#49, #120)

This turned out pretty good overall. The algorithm has been moved from
main.cpp to whisper.cpp and can be reused for all subtitles types. This
means that now you can specify the maximum length of the generated
lines. Simply provide the "-ml" argument specifying the max length in
number of characters

2 years agoUpdate README.md
Georgi Gerganov [Wed, 2 Nov 2022 16:33:29 +0000 (18:33 +0200)]
Update README.md

2 years agoextra : compute SHA of all models files
Georgi Gerganov [Wed, 2 Nov 2022 16:31:55 +0000 (18:31 +0200)]
extra : compute SHA of all models files

2 years agowhisper : fix extra memory usage after recent processor changes
Georgi Gerganov [Wed, 2 Nov 2022 16:31:18 +0000 (18:31 +0200)]
whisper : fix extra memory usage after recent processor changes

Had increased the memory buffer to the size of the model and forgot to
bring it down.

2 years agoAllow building with Accelerate for x86_64 Macs (#123)
Syed Jafri [Wed, 2 Nov 2022 16:00:19 +0000 (10:00 -0600)]
Allow building with Accelerate for x86_64 Macs (#123)

* Cross compile windows

* set env properly

* rm log

* fix review

* Add back space

* Don't force architecture

* Allow building x86_64 with accelerate