Ewan Crawford [Thu, 22 May 2025 08:24:09 +0000 (09:24 +0100)]
SYCL: Avoid using with SYCL-Graph for unsupported nodes (llama/13587)
Currently on a CUDA backend to SYCL when running
`GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there
are two operations that throw an exception from the blocking
waits during queue recording.
* `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187
* `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074
We've noticed that `ggml-cuda.cu` has the
[check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458)
method for checking if a graph can be used, even if enabled. I've taken a
similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking
if a graph can be used for the operations even if a user has asked for it to be
enabled.
Henry Linjamäki [Wed, 21 May 2025 20:21:17 +0000 (23:21 +0300)]
opencl: fix couple crashes (llama/12795)
* opencl: fix couple crashes
* fix kernel launches failed on devices which do not support
non-uniform work-groups. When non-uniform work-groups are not
supported, set `local_work_size` to NULL (= let driver choose the
work-group sizes). This patch does not cover everything - just the
cases tested by test-backend-ops.
* fix sub-buffer creation failed due to `cl_buffer_region::origin` not
being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`.
* OpenCL: query non-uniform WG sizes only on OpenCL 3.0+
Nicolò Scipione [Tue, 20 May 2025 00:54:43 +0000 (02:54 +0200)]
sycl : Overcoming workaround for mmap() allocation on Windows (llama/13482)
* Remove mmap workaround on windows
After some testing I found that mmap is supported on windows and for
many GPUs on Linux. Therefore I remove the workaround for windows since
it is not necessary.
* Update llama-bench README
SYCL backend introduced a workaround that allows execution of
llama-bench also without specifying `--mmp 0` flag
Daniel Bevenius [Tue, 27 May 2025 08:53:50 +0000 (10:53 +0200)]
docs : convert README_sycl.md to utf8 format [no ci] (#3191)
This commit updates the README_sycl.md file to use UTF-8 encoding.
The motivation for this is that while this file displays correctly in
github it will fail to render with tools that expect UTF-8 encoding.
For example this is the case when using `grip` to view the file locally.
Daniel Bevenius [Tue, 27 May 2025 03:51:47 +0000 (05:51 +0200)]
node : enable no_prints to suppress all output (#3189)
This commit enable the node addon to suppress all output, even the
result of the transcription if the no_prints parameter is set to true.
The motivation for this is that for the node addon there is a
fullfilment handler/success callback to process the transcription
result. And it might be useful to be able to disable the printing of
the transcription result to the console, so that the user can handle
the result in their own way.
Daniel Bevenius [Fri, 23 May 2025 03:48:08 +0000 (05:48 +0200)]
ci : use dynamic libopenblas.dll for window-blas (#3177)
* ci : use dynamic libopenblas.dll for window-blas
This commit updates the windows-blas job to use the dynamic (can load
different kernels depending of the CPU arch) libopenblas.dll instead of
the "static" openblas.dll that get installed by vcpgk.
The motivation for this change is that there have been reports of
performance drops in later version specifically related to blas. Please
see the links below for more details.
Daniel Bevenius [Mon, 19 May 2025 15:59:43 +0000 (17:59 +0200)]
ruby : add GGML_SYCL_DNN option to ruby bindings (#3172)
This commit adds the `GGML_SYCL_DNN` option to the Ruby bindings for
the GGML library. This option as added to ggml in
Commit (5e7e07758a5f3172380500e173ca71f679bbef1e "sycl: use oneDNN for
matrices multiplication")
The motivation for this change to enable the CI build to pass.
Jeff Bolz [Wed, 14 May 2025 09:55:26 +0000 (18:55 +0900)]
vulkan: KHR_coopmat flash attention (llama/13506)
This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.
Daniel Bevenius [Fri, 16 May 2025 06:50:53 +0000 (08:50 +0200)]
vad : return early if no vad segments are detected (#3158)
This commit adds a check to `whisper_full_with_state` and if no VAD
segments are detected, the function will return early.
The motivation for this is that if no VAD segments are detected, the
function will not have any samples to process which can happen if an
audio sample does not contain any speech. I did not test this previously
and only discovered this when updating the stream example.
Daniel Bevenius [Fri, 16 May 2025 05:53:26 +0000 (07:53 +0200)]
vad : store VAD context in whisper_state (#3156)
* vad : store VAD context in whisper_state
This commit stores the VAD context in the whisper_state structure,
allowing for better management and reuse of the VAD context across
multiple calls to the whisper_vad function.
The motivation for this change is that when updating the stream example
I noticed that the VAD context was being re-initialized every time the
whisper_vad function was called. This involved loading the VAD model
which is expensive and unnecessary if the context can be reused.
Storing this in the whisper_state seems follow the pattern simliar to
how whisper_coreml_context and whisper_openvion_context are stored.
Daniel Bevenius [Thu, 15 May 2025 12:28:10 +0000 (14:28 +0200)]
whisper : add build_*/ to .gitignore [no ci] (#3157)
This commit add `build_*/` to `.gitignore` to ignore all build
directories that start with `build_`.
The motivation for this is that the Go bindings creates a directory
named build_go, which is not ignored by the current .gitignore. I was
not sure if changing this to build-go could effect exising users so I
opted to update .gitignore instead.
Daniel Bevenius [Wed, 14 May 2025 17:21:48 +0000 (19:21 +0200)]
examples : add --print-confidence option to cli (#3150)
* examples : add --print-confidence option to cli
This commit adds a new command-line option `--print-confidence` to the
whisper-cli. When enabled, this option prints the confidence level of each
token in the transcribed text using ANSI formatting codes.
The confidence levels are represented using different styles:
```console
main: confidence: highlighted (low confidence), underlined (medium), dim (high confidence)
```
Daniel Bevenius [Mon, 12 May 2025 14:10:11 +0000 (16:10 +0200)]
vad : add initial Voice Activity Detection (VAD) support (#3065)
* vad : add initial Voice Activity Detection (VAD) support
This commit add support for Voice Activity Detection (VAD). When enabled
this feature will process the audio input and detect speech segments.
This information is then used to reduce the number of samples that need
to be processed by whisper_full.
Daniel Bevenius [Mon, 12 May 2025 08:43:04 +0000 (10:43 +0200)]
cli : print color scheme info for --print-colors (#3141)
This commit adds a description of the color scheme used in the CLI
when the --print-colors option is enabled.
The motivation for this is that it is not immediately clear what the
color scheme is when using the CLI with the --print-colors option.
Example output:
```console
$ ./build/bin/whisper-cli -f samples/jfk.wav --print-colors
...
main: color scheme: red (low confidence), yellow (medium), green (high confidence)
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
```
The description will not be dispayed if the `--no-prints` options is
set.
Daniel Bevenius [Mon, 12 May 2025 07:02:06 +0000 (09:02 +0200)]
examples : update link to Paul Tol's color scheme [no ci] (#3140)
This commit updates the link to Paul Tol's color scheme in the
`examples/common.h` file. The previous link was outdated and
pointed to a non-existent page.
Daniel Bevenius [Sat, 10 May 2025 06:18:08 +0000 (08:18 +0200)]
ruby : omit test_build_options locally (#3132)
This commit omits the test for `test_build_options` when run locally as
it currently fails on Linux and MacOS platforms.
`
The motivation for this change is that currently when running the tests
locally on a non-macOS platform the test fails with the following error:
```console
.F
========================================================================
Failure: test_build_options(TestPackage):
<["ACCELERATE_FRAMEWORK",
"CMAKE_OSX_ARCHITECTURES",
"CMAKE_OSX_SYSROOT",
"FOUNDATION_LIBRARY",
"METALKIT_FRAMEWORK",
"METAL_FRAMEWORK"]> was expected to be empty.
/home/danbev/work/ai/whisper.cpp/bindings/ruby/tests/test_package.rb:43:in `test_build_options'
40: options = BuildOptions::Options.new
41: assert_empty options.missing_options
42: unless ENV["CI"]
=> 43: assert_empty options.extra_options
44: end
45: end
46: end
========================================================================
```
Enes Grahovac [Sat, 10 May 2025 04:44:13 +0000 (00:44 -0400)]
examples : add HEAPU8 to all of the exported runtime methods (#3134)
This commit adds HEAPU8 to the list of exported methods.
The motivation for this commit is that currently this is causing an error on Window systems where HEAPU8 in undefined, which results in the following error message in the web console:
main.js:1 Uncaught TypeError:
Cannot read properties of undefined (reading 'buffer') at __emval_get_property
(main.js:1:1363125) at 003a453a:0xc4a47 at 003a453a:0xc51cd at
Object.full_default (eval at craftInvokerFunction (main.js:1:1347011),
<anonymous>:9:10) at whisper.cpp/:647:42
danbev originally fixed this for whisper.wasm, stream.wasm, and command.stream, but the issue still exists on the other examples which I patch in this code.
This commit updates the documentation for the WASM examples to include a
note about the generation of the `worker.js` file. As of Emscripten
3.1.58 (April 2024), separate worker.js files are no longer generated
and the worker is embedded in the main JS file.
The motivation for this change is to inform users about the new behavior
of Emscripten and why the `worker.js` file may not be present.