ggml-cpu : "align corners" for bilinear upscale/downscale (ggml/1285)
* add "align corners" mode for bilinear upscale, and allow downscaling
* add ggml_interpolate, deprecate ggml_upscale_ext, pass in align-corners as bit-flag
* test-backend-ops: replace ggml_upscale_ext with ggml_interpolate, add test cases for downscale and align-corners
Daniel Bevenius [Fri, 27 Jun 2025 13:43:56 +0000 (15:43 +0200)]
ci : use selective copy for musa image (#3296)
This commit modified the musa docker file to selectively copy
directories needed for the container image.
This commit also added a step to the docker workflow to free up disk
space in attempt to make enough room for the large musa build
containers.
The motivation for this change is to reduce the size of the container
image and try to avoid disk usage issues in CI.
Daniel Bevenius [Fri, 27 Jun 2025 07:55:56 +0000 (09:55 +0200)]
ci: set fail-fast to false in docker.yml (#3294)
* ci: set fail-fast to false in docker.yml
This commit modifies the GitHub Actions workflow for Docker builds to
disable the fail-fast behavior.
The motivation for this is that currently if one of the strategy jobs
fails any other job that is in progress will be cancelled. There is no
need for this as the jobs are independent.
* ci : update docker.yml to use a single build
This commit updates the docker job to only build the image once instead
of twice (only happens when pushing to the master branch). Instead this
will tag the image with the commit SHA when pushing to master.
The motivation for this change is to reduce the time it takes to run
this job and also it might help with the disk space issues we are
experiencing for this job when it runs on pushes to master.
Daniel Bevenius [Thu, 26 Jun 2025 14:29:29 +0000 (16:29 +0200)]
ci : add should_release variable (#3288)
* ci : add should_release variable
This commit adds a `should_release` variable to the GitHub Actions
workflow to determine if a release should be created based on the tag or
branch conditions.
The motivation for this that it simplifies the logic for deciding
whether to upload artifacts or not, making it easier to maintain if we
need to change the conditions in the future.
Daniel Bevenius [Wed, 25 Jun 2025 19:43:58 +0000 (21:43 +0200)]
ci : add support for tag-based releases (#3287)
This commit modifies the GitHub Actions workflow to support
tag-based releases. When a tag is pushed that starts with 'v', the
workflow will use that tag name for the release process.
I think this was the once the behavior, but it was lost in updates that
I've made to the workflow. This commit restores that functionality.
Daniel Bevenius [Wed, 25 Jun 2025 12:16:31 +0000 (14:16 +0200)]
stream : add nullptr check of whisper_context (#3283)
* stream : add nullptr check of whisper_context
This commit adds a check to ensure that the `whisper_context` is not
null after initialization.
The motivation for this is that currently, if the initialization fails,
the program continues to run leading to a segmentation fault. This sort
of check is performed by others examples like whisper-cli.
Daniel Bevenius [Wed, 25 Jun 2025 10:12:36 +0000 (12:12 +0200)]
ci : enable main-cuda build (#3282)
This commit re-enables the main-cuda Docker build in the CI workflow.
The main-cuda Dockerfile has been updated to remove build artifacts
and also print the size of the /app directory after the build. A similar
change was recently made to the musa Dockerfile, and perhaps this job
was also having similar disk space issues.
The motivation for this change is that this configuration has been
disabled for a while due to persistent build failures. However, the
actual logs are now longer available.
Daniel Bevenius [Tue, 24 Jun 2025 06:20:28 +0000 (08:20 +0200)]
ci : reduce musa image size (#3277)
* ci : reduce musa image size
This commit contains an attempt to reduce the size of the musa Docker
image by copying only the necessary files from the build stage.
The motivation for this is that the CI runs sometimes fail with out of
memory errors. These seems to be able to pass for PRs, at least
sometimes but fail upon push to the master branch.
* ci : remove build time files instead of selective copying
Daniel Bevenius [Mon, 23 Jun 2025 10:34:44 +0000 (12:34 +0200)]
ci : add apt-get clean to musa Dockerfile (#3275)
* ci : add apt-get clean to musa Dockerfile
This commit adds `apt-get clean` to the musa Dockerfile to reduce the
image size by removing cached package files after installation.
The motivation for this is to try to reduce the size of the Docker image
and see if this can avoid the "no space left on device" error during
the CI build process.
Nicolò Scipione [Fri, 20 Jun 2025 13:07:21 +0000 (15:07 +0200)]
sycl: add usage of enqueue_functions extension (llama/14244)
* Add header and namespace to use enqueue_functions extension
* Convert submit and parallel_for to use new extension in convert.cpp
* Convert submit and parallel_for to use extension in ggml-sycl.cpp
* Convert submit and parallel_for to use extension in gla.cpp
* Convert submit and parallel_for in mmq.cpp
* Convert submit and parallel_for in mmvq.cpp
* Convert submit and parallel_for in remaining files
* Convert all simple parallel_for to nd_launch from enqueue_functions
extension
* Wrapping extension in general function
Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.
* android.java : update cmake to use FetchContent for ggml
This commit updates the CMake configuration for the Android Java example
to use `FetchContent` for including the `ggml` library. Do be able to
use FetchContent we also update the `compileSdkVersion` and
`targetSdkVersion` to 31, and the `buildToolsVersion` to '30.0.3'.
This also required a an update to the Gradle plugin version to 7.4.0.
The motivation for this change is avoid having to make manual changes to
the CMakeLists.txt file after syncing the ggml library.
Daniel Bevenius [Wed, 18 Jun 2025 15:41:43 +0000 (17:41 +0200)]
examples : add stereo to mono conversion in read_audio_data (#3266)
This commit adds a conversion from stereo to mono in the
`read_audio_data` function of `common-whisper.cpp`.
The motivation for this change is prior to Commit 7d3da68f792018e81a758881e081154d1cbe6b6f ("examples : use miniaudio for
direct decoding flac, mp3, ogg and wav (#2759)", there was a step that
read stereo int16 data -> pcm16 (448512 samples), and then converted to
mono (224256 samples), and then also convert to stereo in `pcmf32s.
The middle step here seems to have been missed when rewriting the code to
use Miniaudio and caused issues then transcribing stereo audio files.
For example, currently using the audio sample in the linked issue the
output is:
```console
[00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org
```
And with the change in this commit the output is:
```
[00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone*
[00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme !
[00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ?
[00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout !
[00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de...
[00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier
```
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: operator()
Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154
Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```
Christian Kastner [Wed, 11 Jun 2025 19:07:44 +0000 (19:07 +0000)]
Implement GGML_CPU_ALL_VARIANTS for ARM (llama/14080)
* ggml-cpu: Factor out feature detection build from x86
* ggml-cpu: Add ARM feature detection and scoring
This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.
This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM
Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.
Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.
* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now
The other platforms will need their own specific variants.
This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.
Jeff Bolz [Wed, 11 Jun 2025 14:48:52 +0000 (09:48 -0500)]
vulkan: Better thread-safety for command pools/buffers (llama/14116)
This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.
Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.
Daniel Bevenius [Fri, 13 Jun 2025 13:06:42 +0000 (15:06 +0200)]
ggml : disable warnings for tests when using MSVC (ggml/1273)
* ggml : disable warnings for tests when using MSVC
This commit disables warnings for tests on windows when using MSVC.
The motivation for this is that this brings the build output more
inline with what Linux/MacOS systems produce.
There is still one warning generated for the tests which is:
```console
Building Custom Rule C:/ggml/tests/CMakeLists.txt
cl : command line warning D9025: overriding '/DNDEBUG' with '/UNDEBUG'
[C:\ggml\build\tests\test-arange.vcxproj]
test-arange.cpp
test-arange.vcxproj -> C:\ggml\build\bin\Release\test-arange.exe
```
This commit removes the unused `ggml_context_container` structure from
the ggml library. It looks like the usage of this struct was removed in
Commit 4757fe18d56ec11bf9c07feaca6e9d5b5357e7f4 ("ggml : alloc
ggml_contexts on the heap (whisper/2525)").
The motivation for this changes is to improve code clarity/readability.
Daniel Bevenius [Thu, 12 Jun 2025 10:27:09 +0000 (12:27 +0200)]
examples : include examples in msvc disable warn (ggml/1270)
This commit adds the examples in the "list" of targets to ignore MSVC
warnings.
The motivation for this is that currently the examples generate a number
of warnings that are ignore/disabled for the core ggml project. This
makes for a cleaner output when building.
Daniel Bevenius [Wed, 18 Jun 2025 09:30:29 +0000 (11:30 +0200)]
whisper : clear result_all if vad_samples is empty (#3262)
This commit clears the results_all vector no VAD segments are found.
The motivation for this is that this would normally be done by
`whisper_full_with_state` but when no VAD segments are detected this
current implementation does not call that function and hence the vector
does not get reset. This can lead to issues in applications like the
server example where it will incorrectly process the old results.
Daniel Bevenius [Tue, 17 Jun 2025 09:29:48 +0000 (11:29 +0200)]
examples : set the C++ standard to C++17 for server (#3261)
This commit updates the server example to use C++17 as the standard.
The motivation for this change is that currently the ci-run
`ggml-100-mac-m4` is failing when compiling the server example on
macOS. The `talk-llama` example also has this setting so it looks like
an alright change to make.
Daniel Bevenius [Fri, 13 Jun 2025 15:35:52 +0000 (17:35 +0200)]
whisper : fix VAD processing for skipped audio segments (#3230)
This commit addresses an issue with token timestamps when audio segments
are skipped, in `whisper_exp_compute_token_level_timestamps` related to
the VAD processing and the energy levels.
The motivation for this is that the token timestamps exceed the energy
array bounds due to segment timing misalignment:
```console
(skipped introduction)
↓
Audio segment: [2600ms → 5600ms] (3 seconds of actual audio)
Energy array: [0 → 480652] (samples for 3 seconds)
Token timestamps: [3266ms → 3408ms] (absolute timestamps)
```
So both `s0` and `t1` get clamped to the maximum sample index (480652)
which causes the start/end timestamps to be the same for all the tokens
after a certain point.
This is addressed by using segment-relative timestamps in the
`timestamp_to_sample` and `sample_to_timestamp`.
Daniel Bevenius [Fri, 13 Jun 2025 08:25:25 +0000 (10:25 +0200)]
cli : fix short name conflict for vad options [no ci] (#3247)
This commit fixes a short name conflict whisper-cli for
`--vad-min-speech-duration-ms` and `--vad-min-silence-duration-ms` which
currently have the same short name `-vsd`.
Daniel Bevenius [Fri, 13 Jun 2025 08:04:20 +0000 (10:04 +0200)]
ruby : add .gitignore entries for ext directory (#3245)
This commit adds entries to `.gitignore` for directories in the
`ext` directory.
The motivation for this is that currently after building locally these
following files are reported by git as untracked:
```console
Untracked files:
(use "git add <file>..." to include in what will be committed)
ext/examples/
ext/ggml/
ext/include/
ext/scripts/
ext/src/
```
Daniel Bevenius [Wed, 11 Jun 2025 11:53:16 +0000 (13:53 +0200)]
ci : update windows runner to windows-2022 (#3242)
* ci : update windows runner to windows-2022
This commit changes the windows-2019 runner to windows-2022.
The motiation for this is that the windows-2019 runner is scheduled for
deprection and will be removed 2025-06-30. There are currently "burnout"
periods that started 2025-06-01 and during these times jobs with
windows-2019 will fail which has happened lately on our CI.
Daniel Bevenius [Tue, 10 Jun 2025 13:06:40 +0000 (15:06 +0200)]
ruby : add cleaning of library names in dependencies (#3241)
* ruby : add cleaning of library names in dependencies
This commit adds a cleaning step to the library names in the
`Dependencies` class of the Ruby bindings.
The motivation for this is that with the introduction of a library name
alias for ggml in Commit (b933d17c306e800b6d919e3ee895219c3f64d5cd
"Add in-build ggml::ggml ALIAS library (ggml/1260)) causes the Makefile
generation to break:
```console
$ sed -n '165,170p' ext/Makefile
CLEANOBJS = $(OBJS) *.bak
TARGET_SO_DIR_TIMESTAMP = $(TIMESTAMP_DIR)/.sitearchdir.time
$(TARGET_SO): libcommon.a libwhisper.a libggml\n(ggml::ggml).a libggml-cpu.a libggml-base.a
libcommon.a libwhisper.a libggml\n(ggml::ggml).a libggml-cpu.a libggml-base.a: cmake-targets
cmake-targets:
/usr/bin/cmake -S sources -B build -D BUILD_SHARED_LIBS=OFF -D CMAKE_ARCHIVE_OUTPUT_DIRECTORY=/home/danbev/work/ai/whisper.cpp/bindings/ruby/ext -D CMAKE_POSITION_INDEPENDENT_CODE=ON
```
* squash! ruby : add cleaning of library names in dependencies