git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

overview / pkg / ggml / sources / whisper.cpp / log

Jay [Sat, 29 Mar 2025 10:04:58 +0000 (18:04 +0800)]

cmake : fix ccache conflict (llama/12522)

If users already set CMAKE_C_COMPILER_LAUNCHER globally, setting it in
cmake again will lead to conflict and compile fail.

Signed-off-by: Jay <redacted>

commit | commitdiff | tree

Xuan-Son Nguyen [Sat, 29 Mar 2025 10:59:56 +0000 (11:59 +0100)]

cpu : rm unused variable (ggml/1166)

commit | commitdiff | tree

cmdr2 [Sat, 29 Mar 2025 06:07:13 +0000 (11:37 +0530)]

cpu: de-duplicate some of the operators and refactor (ggml/1144)

* cpu: de-duplicate some of the operators and refactor

* Fix PR comments

* Fix PR comments

commit | commitdiff | tree

Sandro Hanea [Mon, 31 Mar 2025 10:44:36 +0000 (12:44 +0200)]

cmake: improve Vulkan cooperative matrix support checks (#2966)

Co-authored-by: Sandro Hanea <redacted>

commit | commitdiff | tree

Daniel Bevenius [Mon, 31 Mar 2025 10:32:27 +0000 (12:32 +0200)]

examples : update README links to point to pages deployment (#2971)

This commit updates the README links to point to the pages deployment
instead of whisper.ggerganov.com.

commit | commitdiff | tree

Daniel Bevenius [Mon, 31 Mar 2025 09:34:40 +0000 (11:34 +0200)]

ci : add github pages workflow for wasm examples (#2969)

* ci : add github pages workflow for wasm examples

This commit adds a github workflow to build and deploy the wasm examples
to github pages. The whisper.wasm example is deployed as the main page.

This workflow is trigged by a push to master and will deploy the
examples to: https://ggerganov.github.io/whisper.cpp/.

This requires that the repository has enabled github actions in
`Settings` -> `Pages` -> `Build and deployment` -> `Source` be set to
`GitHub Actions`.

One thing to note is that this commit removes the `talk` example as I'm
not sure how this example is built yet.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2784

commit | commitdiff | tree

Sacha Arbonel [Mon, 31 Mar 2025 08:03:41 +0000 (10:03 +0200)]

feat: add health check endpoint to server (#2968)

commit | commitdiff | tree

Daniel Bevenius [Sun, 30 Mar 2025 03:56:10 +0000 (05:56 +0200)]

whisper : remove unnecessary GGML_UNUSED macro (#2960)

commit | commitdiff | tree

Georgi Gerganov [Fri, 28 Mar 2025 18:58:21 +0000 (20:58 +0200)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 28 Mar 2025 18:21:59 +0000 (20:21 +0200)]

metal : improve FA + improve MoE (llama/12612)

* ggml : FA with different K, V head sizes (CPU)

ggml-ci

* metal : add FA with HS=192

* metal : extend FA to support different K and V head sizes

ggml-ci

* metal : add FA vector kernels for heads K 192 and V 128

ggml-ci

* ggml : restrict op on other backends to equal head sizes

ggml-ci

* metal : optimize FA-vec kernel

ggml-ci

* metal : FA remove mq registers

* metal : improve MoE mul_mat_id condition

ggml-ci

* metal : fix comments + remove unnecessary addition

ggml-ci

* metal : avoid too much shared memory usage with mul_mat_id

ggml-ci

commit | commitdiff | tree

Icenowy Zheng [Fri, 28 Mar 2025 17:51:06 +0000 (01:51 +0800)]

vulkan: fix coopmat shader generation when cross-compiling (llama/12272)

* vulkan: fix coopmat shader generation when cross-compiling

Previously the status of coopmat{,2} support isn't passed to the
vulkan-shaders-gen project building on the host, which leads to build
failure because of the cross-compiling code expecting coopmat{,2}
shaders that didn't get generated.

Fix this by passing the coopmat{,2} support status to vulkan-shaders
subproject.

Signed-off-by: Icenowy Zheng <redacted>
* Only call coop-mat shaders once

* Fix whitespace

---------

Signed-off-by: Icenowy Zheng <redacted>
Co-authored-by: bandoti <redacted>

commit | commitdiff | tree

amritahs-ibm [Fri, 28 Mar 2025 07:43:22 +0000 (13:13 +0530)]

llamafile : ppc64le GEMV forwarding for FP32. (llama/12594)

This patch enables usage of MMA when one of the
dimensions of the matrix(ie either M or N) is 1. This
is useful in case of token generation where N < 2.

The concept of 'GEMV Forwarding' is used where when one
of the matrix has a single row/column, the elements are
broadcasted, instead of using packing routine to prepack
the matrix elements.

This change results in 5% - 15% improvement in total
speed(ie all tokens/total time), across various batch
sizes. This is in comparision with the corresponding
dot product implementation.

The patch is tested with FP32 models of Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Fri, 28 Mar 2025 06:18:04 +0000 (08:18 +0200)]

rpc : send hash when tensor data is above some fixed threshold (llama/12496)

* rpc : send hash when tensor data is above some fixed threshold

ref #10095

* rpc : put cache under $HOME/.cache/llama.cpp

* try to fix win32 build

* another try to fix win32 build

* remove llama as dependency

commit | commitdiff | tree

lhez [Thu, 27 Mar 2025 15:08:08 +0000 (08:08 -0700)]

opencl: add multi and vision rope, `gelu_quick` and `im2col` (llama/12600)

* opencl: add `im2col`

* opencl: add `gelu_quick`

* opencl: add mrope

* opencl: add vision rope

commit | commitdiff | tree

Amanda Der Bedrosian [Fri, 28 Mar 2025 11:26:22 +0000 (04:26 -0700)]

bindings.go : add DetectedLanguage to go bindings (#2947)

Adding in DetectedLanguage(), a function to retrieve the detected
language that's populated by processing audio. Also adding in a unit
test to test the success.

Co-authored-by: Amanda Der Bedrosian <redacted>

commit | commitdiff | tree

Daniel Bevenius [Fri, 28 Mar 2025 08:29:56 +0000 (09:29 +0100)]

ruby : fix test failures in test_whisper (#2955)

* bindings.ruby : fix test failures in test_whisper

This commit updates the parallel tests to use 2 processors instead of
the number of processors on the system. It also comments out the setting
of the log callback to an empty lambda as this causes a segfault when
enabled.

The motivation for the change to the number of processors is that if one
has a large number of processors, for example I have 16 on the machine I
used to test this, this would cause the following warning to be printed:
```console
whisper_full_with_state: input is too short - 680 ms < 1000 ms. consider padding the input audio with silence
```

This is logged from:
```c++
int whisper_full_with_state(
        struct whisper_context * ctx,
          struct whisper_state * state,
    struct whisper_full_params   params,
                   const float * samples,
                           int   n_samples) {
   ...
    if (seek_end < seek_start + 100) {
        WHISPER_LOG_WARN("%s: input is too short - %d ms < 1000 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
        return 0;
    }
```
This will return early and there will be segment callbacks to be invoked
which in turn will cause the tests to fail.

* bindings.ruby : fix warnings in tests

This commit fixes the following warnings in the Ruby tests:
```console
/whisper/bindings/ruby/tests/test_segment.rb:52:
warning: ambiguity between regexp and two divisions:
wrap regexp in parentheses or add a space after `/' operator
```
And also adds a '_' prefix to some unused variables to avoid warnings.

* bindings.ruby : enable Wisper.log_set in tests

The commit reverts the commenting out of the Whisper.log_set call in
the test_whisper.rb tests.

I'm no longer getting segfaults when running the tests with this
which was the case earlier. One theory could be that I rebased this to
include the latest ggml sync to master to make sure things still worked.
With the latest changes in ggml, I can't reproduce the segfaults.

commit | commitdiff | tree

Lin Xiaodong [Fri, 28 Mar 2025 05:34:26 +0000 (13:34 +0800)]

examples : support progress_callback API for addon.node (#2941)

* feat: progress supported

* fix: missing params

* style: Format the code to improve readability

Unified code indentation ensures consistent coding style, enhancing code readability and maintainability.

* feat: support prompt api

---------

Co-authored-by: linxiaodong <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 08:30:09 +0000 (10:30 +0200)]

xcf : fix visionOS build

ref: https://github.com/ggml-org/llama.cpp/pull/12415

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 08:15:02 +0000 (10:15 +0200)]

files : remove old wkv6 (#0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 08:13:47 +0000 (10:13 +0200)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 07:12:54 +0000 (09:12 +0200)]

ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)

commit | commitdiff | tree

amritahs-ibm [Thu, 27 Mar 2025 06:51:47 +0000 (12:21 +0530)]

llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>

commit | commitdiff | tree

Akarshan Biswas [Thu, 27 Mar 2025 01:46:00 +0000 (07:16 +0530)]

SYCL: implement memset ggml backend buffer interface (llama/12580)

* SYCL: implement memset ggml backend buffer interface

* use GGML_ABORT macro

* Do not wait for all queues to finish for memset operation

commit | commitdiff | tree

Slobodan Josic [Wed, 26 Mar 2025 22:46:30 +0000 (23:46 +0100)]

HIP: Add support for RDNA4 targets (llama/12372)

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Mar 2025 19:38:38 +0000 (21:38 +0200)]

metal : refactor mat-vec code (llama/12569)

* metal : refactor mat-vec code

ggml-ci

* metal : rename all_sum -> sum_all

ggml-ci

* metal : fix comments [no ci]

* metal : fix nr constant [no ci]

* metal : mv q6_K support nr0 > 1

ggml-ci

* metal : reduce register pressure

ggml-ci

* metal : fix typo [no ci]

* metal : reduce register pressure

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Mar 2025 11:02:00 +0000 (13:02 +0200)]

ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544)

* ggml : fix MUL_MAT_ID repack with Q8_K

ggml-ci

* ggml : improve repack templates

ggml-ci

commit | commitdiff | tree

Dan Johansson [Tue, 25 Mar 2025 11:10:18 +0000 (12:10 +0100)]

ggml-cpu : update KleidiAI to v1.5.0 (llama/12568)

ggml-cpu : bug fix related to KleidiAI LHS packing

Signed-off-by: Dan Johansson <redacted>

commit | commitdiff | tree

Akarshan Biswas [Tue, 25 Mar 2025 10:40:18 +0000 (16:10 +0530)]

SYCL: disable Q4_0 reorder optimization (llama/12560)

ggml-ci

commit | commitdiff | tree

lhez [Mon, 24 Mar 2025 16:20:47 +0000 (09:20 -0700)]

opencl: simplify kernel embedding logic in cmakefile (llama/12503)

Co-authored-by: Max Krasnyansky <redacted>

commit | commitdiff | tree

R0CKSTAR [Mon, 24 Mar 2025 10:28:34 +0000 (18:28 +0800)]

CUDA: Fix clang warnings (llama/12540)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Jeff Bolz [Mon, 24 Mar 2025 06:56:17 +0000 (01:56 -0500)]

vulkan: fix mul_mat_vec failure in backend tests (llama/12529)

The OOB calculation could be wrong if the last iteration was during one of
the unrolled loops. Adjust the unrolling counts to avoid this. Add a couple
new backend tests that hit this failure on NVIDIA GPUs.

commit | commitdiff | tree

Georgi Gerganov [Sat, 22 Mar 2025 14:23:26 +0000 (16:23 +0200)]

ggml : fix quantized cpy op (llama/12310)

* ggml : fix quantized cpy op

ggml-ci

* tests : add cpy tests for all types

ggml-ci

* tests : add BF16 copy tests

ggml-ci

* tests : fix loop for same-type copy

ggml-ci

* tests : add option to permute the dst tensor

ggml-ci

commit | commitdiff | tree

R0CKSTAR [Sat, 22 Mar 2025 09:11:37 +0000 (17:11 +0800)]

musa: refine compute capability (llama/12493)

* musa: refine compute capability

Signed-off-by: Xiaodong Ye <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Jeff Bolz [Sat, 22 Mar 2025 08:40:11 +0000 (03:40 -0500)]

vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)

* tests: add mul_mat perf/functional tests for p021/nc vulkan shaders

* vulkan: Optimize mul_mat_vec p021 and nc shaders.

These shaders are used in attention calculations, and when the KV cache grows
large they start to dominate the run time. For the nc shader (which is called
with large 'k' dimension), use unrolling and vector loads. For the p021 shader
(which is called with large 'm' and small 'k' dimensions), take advantage of
grouped query attention to reuse loads from the A matrix for the whole group,
and reduce the number of workgroups (too much overhead from tiny dispatches).

Using subgroupAdd in the p021 shader also helps, use that conditionally.

commit | commitdiff | tree

stduhpf [Fri, 21 Mar 2025 19:34:50 +0000 (20:34 +0100)]

Vulkan: RTE rounding for cpy to quant (llama/12480)

* Vulkan: RTE rounding for cpy to quant

Co-Authored-By: Jeff Bolz <redacted>
* remove trailing whitespace

* avoid duplicating pipeline_cpy_f32_quant

* fix copypasting issue

* remove duplicated code

---------

Co-authored-by: Jeff Bolz <redacted>

commit | commitdiff | tree

Eve [Fri, 21 Mar 2025 19:27:47 +0000 (19:27 +0000)]

vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/12472)

commit | commitdiff | tree

蕭澧邦 [Fri, 21 Mar 2025 06:58:47 +0000 (14:58 +0800)]

Fix build on Windows when ccache enabled (ggml/9954) (llama/9976)

* [SYCL] Fix build on Windows when ccache enabled (llama/9954)

* take effect only on windows and force it to icl

---------

Co-authored-by: Romain Biessy <redacted>

commit | commitdiff | tree

Svetlozar Georgiev [Fri, 21 Mar 2025 02:15:56 +0000 (02:15 +0000)]

sycl: cleanup oneDNN related code (llama/12097)

commit | commitdiff | tree

Srihari-mcw [Thu, 20 Mar 2025 11:35:34 +0000 (17:05 +0530)]

ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (llama/12332)

* Add block interleaving support for Q4_K quantization

* Remove whitespaces and fix CI/CD issues

* Update pointer of bsums from int16_t to const int16_t

* Add vector version of quantize_q8_K_4x8 function

* Update code formatting based on review comments

commit | commitdiff | tree

Gaurav Garg [Wed, 19 Mar 2025 19:52:06 +0000 (01:22 +0530)]

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)

- Find out active blocks per SM using cudaOccupancyMaxActiveBlocksPerMultiprocessor API. Use this value to determine the optimal parallel_blocks value.
- Prefer vector flash attention kernels over MMA kernel for BS=1

Fixes Issue: #12182
---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Jeff Bolz [Wed, 19 Mar 2025 18:56:23 +0000 (13:56 -0500)]

vulkan: optimize iq1 coopmat2 dequant functions (llama/12427)

commit | commitdiff | tree

Guus Waals [Wed, 19 Mar 2025 10:15:23 +0000 (10:15 +0000)]

Fix visionOS build and add CI (llama/12415)

* ci: add visionOS build workflow

Add a new GitHub Actions workflow for building on visionOS with CMake and Xcode.

* ggml: Define _DARWIN_C_SOURCE for visionOS to fix missing u_xxx typedefs

* ci: remove define hacks for u_xxx system types

---------

Co-authored-by: Giovanni Petrantoni <redacted>

commit | commitdiff | tree

Jeff Bolz [Wed, 19 Mar 2025 07:26:26 +0000 (02:26 -0500)]

vulkan: Submit once enough matmul work has been recorded (llama/12406)

I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.

commit | commitdiff | tree

lhez [Tue, 18 Mar 2025 19:54:55 +0000 (12:54 -0700)]

opencl: improve profiling (llama/12442)

* opencl: more profiling timing

* opencl: generate trace for profiling

* opencl: reduce profiling overhead

* Populate profiling timing info at the end rather than after each
kernel run

* opencl: fix for chrome tracing

commit | commitdiff | tree

R0CKSTAR [Tue, 18 Mar 2025 18:28:26 +0000 (02:28 +0800)]

musa: override warp_size of musa device to 32 (llama/12445)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Łukasz Ślusarczyk [Tue, 18 Mar 2025 10:16:31 +0000 (11:16 +0100)]

SYCL: using graphs is configurable by environment variable and compile option (llama/12371)

* alberto changes

* enable sycl graphs by env variable

* fixed compilation warnings in ggml-sycl.cpp

* renamed graph variables

* fix markdown in docs/backend/SYCL.md

Co-authored-by: Romain Biessy <redacted>
* fix markdown in docs/backend/SYCL.md again

* compiling graphs by default, renamed graph_enable to graph_disable

---------

Co-authored-by: Romain Biessy <redacted>

commit | commitdiff | tree

fj-y-saito [Tue, 18 Mar 2025 08:14:39 +0000 (17:14 +0900)]

ggml : add SVE support for q6_K_q8_K (llama/12361)

commit | commitdiff | tree

0cc4m [Tue, 18 Mar 2025 06:21:40 +0000 (07:21 +0100)]

Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (llama/12434)

commit | commitdiff | tree

Łukasz Ślusarczyk [Tue, 18 Mar 2025 00:51:25 +0000 (01:51 +0100)]

fixed compilation warnings in ggml-sycl (llama/12424)

commit | commitdiff | tree

Molly Sophia [Mon, 17 Mar 2025 23:27:50 +0000 (07:27 +0800)]

llama: Add support for RWKV v7 architecture (llama/12412)

* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <redacted>
* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <redacted>
* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <redacted>
* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <redacted>
* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <redacted>
* Apply code-format changes

Signed-off-by: Molly Sophia <redacted>
* fix MUSA build

Signed-off-by: Molly Sophia <redacted>
* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <redacted>
---------

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

Gaurav Garg [Mon, 17 Mar 2025 18:25:13 +0000 (23:55 +0530)]

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)

* Enable CUDA Graph on CTK < 12.x

`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.

* Fix compilation errors with MUSA

* Disable CUDA Graph for MUSA

commit | commitdiff | tree

Guus Waals [Mon, 17 Mar 2025 16:35:43 +0000 (00:35 +0800)]

ggml-vulkan: remove unused find_program(glslc) (llama/12416)

It's already found by FindVulkan.cmake in the parent CMakeLists

commit | commitdiff | tree

Jeff Bolz [Mon, 17 Mar 2025 14:26:18 +0000 (09:26 -0500)]

vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)

commit | commitdiff | tree

Daniele [Mon, 17 Mar 2025 11:42:33 +0000 (12:42 +0100)]

vulkan: subgroup size tuning (llama/12087)

* vulkan: subgroup size test

* Vulkan: Add device architecture enum and logic to recognize AMD generations

* vulkan: use new architecture logic to specify subgroup size

* Initial vulkan subgroup size tuning for RDNA3

* vulkan: commonize RDNA subgroup tuning

* vulkan: override subgroup size if required_subgroup_size = 0

* vulkan: disable warp 32 for RDNA3

* vulkan: fine tuned RDNA1 subgroup sizes

* vulkan: adjusted subgroup size map

* vulkan: fixed RDNA2 subgroup map

---------

Co-authored-by: 0cc4m <redacted>

commit | commitdiff | tree

Jeff Bolz [Mon, 17 Mar 2025 09:43:35 +0000 (04:43 -0500)]

vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)

commit | commitdiff | tree

Jeff Bolz [Mon, 17 Mar 2025 09:41:59 +0000 (04:41 -0500)]

vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (llama/12273)

* vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking

commit | commitdiff | tree

Jeff Bolz [Mon, 17 Mar 2025 09:35:00 +0000 (04:35 -0500)]

vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)

commit | commitdiff | tree

Christian Kastner [Mon, 17 Mar 2025 09:05:23 +0000 (10:05 +0100)]

cmake : enable building llama.cpp using system libggml (llama/12321)

* cmake: Factor out compiler flag function from ggml

llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).

* cmake: Enable building against system ggml

This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.

commit | commitdiff | tree

Akarshan Biswas [Mon, 17 Mar 2025 01:45:12 +0000 (07:15 +0530)]

SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)

* SYCL: set extras only on GGML_TYPE_Q4_0

* release tensor_extras in reset buffer interface

commit | commitdiff | tree

aubreyli [Sat, 15 Mar 2025 14:49:03 +0000 (22:49 +0800)]

SYCL: Delete redundant plus sign and space (llama/12391)

commit | commitdiff | tree

fairydreaming [Sat, 15 Mar 2025 14:19:30 +0000 (15:19 +0100)]

SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (llama/12399)

* sycl : support non-contiguous tensors in binary ops

* sycl : silence unused variable warning

---------

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

Chenguang Li [Sat, 15 Mar 2025 01:31:08 +0000 (09:31 +0800)]

MUL_MAT optimization (llama/12382)

commit | commitdiff | tree

Alberto Cabrera Pérez [Wed, 12 Mar 2025 09:57:32 +0000 (09:57 +0000)]

sycl : variable sg_size support for mmvq kernels (llama/12336)

commit | commitdiff | tree

uvos [Wed, 12 Mar 2025 09:14:11 +0000 (10:14 +0100)]

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)

When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64

commit | commitdiff | tree

Jeff Bolz [Wed, 12 Mar 2025 05:59:19 +0000 (00:59 -0500)]

vulkan: fix bug in coopmat1 mul_mat_id (llama/12316)

* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id

commit | commitdiff | tree

uvos [Tue, 11 Mar 2025 19:16:03 +0000 (20:16 +0100)]

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

jklincn [Tue, 11 Mar 2025 13:25:17 +0000 (21:25 +0800)]

ggml-backend : fix backend search path (llama/12330)

* Fix backend search path

* replace .native() with '/'

* reverted .native()

commit | commitdiff | tree

BB-fat [Tue, 11 Mar 2025 11:45:02 +0000 (19:45 +0800)]

metal : Cache the Metal library at the device context level (llama/12265)

commit | commitdiff | tree

Eve [Mon, 10 Mar 2025 19:28:11 +0000 (19:28 +0000)]

mat vec double buffer (llama/12188)

commit | commitdiff | tree

R0CKSTAR [Mon, 10 Mar 2025 17:18:25 +0000 (01:18 +0800)]

musa: support new arch mp_31 and update doc (llama/12296)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Henry Linjamäki [Mon, 10 Mar 2025 16:57:00 +0000 (18:57 +0200)]

opencl: use OpenCL C standard supported by the device (llama/12221)

This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.

commit | commitdiff | tree

Jason C.H [Sat, 8 Mar 2025 16:02:39 +0000 (00:02 +0800)]

ggml-backend : make path_str compatible with C++20 (llama/12269)

commit | commitdiff | tree

Daniel Bevenius [Fri, 7 Mar 2025 13:15:27 +0000 (14:15 +0100)]

ggml : skip intermediate .air file when compiling .metallib (llama/12247)

This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.

The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.

commit | commitdiff | tree

Christian Kastner [Mon, 10 Mar 2025 18:19:58 +0000 (19:19 +0100)]

cmake: Enable specifying exact PowerPC CPU architecture (ggml/1138)

In the process, guard automatic CPU detection with GGML_NATIVE.

https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#index-mcpu-10

commit | commitdiff | tree

Christian Kastner [Mon, 10 Mar 2025 12:06:21 +0000 (13:06 +0100)]

cmake: Comment out GGML_BIN_DIR for now (ggml/1139)

Nothing installs to it yet, so when attempting to use the cmake package,
set_and_check() triggers an error if the directory doesn't already exist
for other reasons.

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 08:13:26 +0000 (10:13 +0200)]

scripts : update sync

commit | commitdiff | tree

Daniel Bevenius [Wed, 26 Mar 2025 15:21:07 +0000 (16:21 +0100)]

bindings-go : update Makefile to use cmake (#2952)

This commit updates the Makefile to use cmake instead of make to build
whisper.cpp.

The motivation for this change is that currently the make recipe test
will fail with the following error:
```console
$ make test
Mkdir build
Mkdir models
Build whisper
make[1]: Entering directory '/home/danbev/work/ai/whisper-work'
make[1]: *** No rule to make target 'libwhisper.a'. Stop.
make[1]: Leaving directory '/home/danbev/work/ai/whisper-work'
make: *** [Makefile:33: whisper] Error 2
```

commit | commitdiff | tree

Dan Johansson [Wed, 26 Mar 2025 14:54:02 +0000 (15:54 +0100)]

whisper : add support for backends with multiple ggml_backend_buffer_type (#2863)

* whisper : add support for ggml_backend_buffer_type

Signed-off-by: Dan Johansson <redacted>
* fix compile error when building on Ubuntu

Signed-off-by: Dan Johansson <redacted>
* remove copyright header from include file

Signed-off-by: Dan Johansson <redacted>
---------

Signed-off-by: Dan Johansson <redacted>

commit | commitdiff | tree

Daniel Bevenius [Wed, 26 Mar 2025 14:01:28 +0000 (15:01 +0100)]

bindings.java : enable copyLibs task [no ci] (#2949)

* bindings.java : enable copyLibs task [no ci]

This commit adds a dependency on the copyLibs task to the sourcesJar and
jar tasks. This ensures that the libwhisper.so file is copied to the
correct location before the jar is built.

It also sets the executable bit on the gradlew file.

* bindings.java : add copyLibs dep for processResources [no ci]

This will otherwise cause builds to fail after doing an initial build.

* bindings.java : pass structs by value to native code

This commit refactors the code to pass the structs by value to the
native code. This is done by creating a ByValue class for each struct
and using it in the Java code.

The motivation for this change is that without this application crashes
due to what I believe was memory mis-alignement. When the structs were
passed to the native code they would be att different memory locations.
Passing by value overcomes this issue and considering that the structs
hold parementers (context and full params) it might be alright do to
this. These changes allow all the tests to pass.

* bindings.java : fix javadoc warnings [no ci]

* bindings.java : fix libwhisper.dylib path in build.gradle [no ci]

This commit fixes the copyLibwhisperDynlib task in the build.gradle file
to copy the correct libwhisper.dylib file from build/src.

commit | commitdiff | tree

Daniel Bevenius [Wed, 26 Mar 2025 13:49:12 +0000 (14:49 +0100)]

bindings.javascript : update test instructions [no ci] (#2951)

This commit updates the instructions for running the test in the
JavaScript bindings README file.

The motivation for this is for Node.js versions after v16.4.0 the
`--experimental-wasm-threads` and `--experimental-wasm-simd` flags are
no longer required and they generate the following errors:
```console
$ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
node: bad option: --experimental-wasm-threads
node: bad option: --experimental-wasm-simd
```

commit | commitdiff | tree

Page-MS [Wed, 26 Mar 2025 07:30:59 +0000 (03:30 -0400)]

readme : add note about SDL2 (#2946)

Precise the README section about real time audio processing, stating that sdl2 is needed.

commit | commitdiff | tree

Daniel Bevenius [Tue, 25 Mar 2025 17:01:18 +0000 (18:01 +0100)]

whisper.android : add GGML_USE_CPU compile definition (#2945)

This commit add GGML_USE_CPU to built target library to enable CPU
backend.

The motivation for this that without the compile definition the CPU
backend is not enabled and the app will crash when trying to use it.

commit | commitdiff | tree

Daniel Bevenius [Tue, 25 Mar 2025 15:01:59 +0000 (16:01 +0100)]

whisper.android.java : update build with ggml source changes (#2942)

* whisper.android.java : update build with ggml source changes

This commit updates the whisper.android.java build to include the
new ggml source files and directories. The gradle build configuration is
also updated to include the aliyun maven repository.

commit | commitdiff | tree

Akarshan Biswas [Tue, 25 Mar 2025 09:20:37 +0000 (14:50 +0530)]

ci: fix SYCL build (#2943)

commit | commitdiff | tree

Daniel Bevenius [Mon, 24 Mar 2025 13:42:12 +0000 (14:42 +0100)]

examples : reduce initial memory to 512MB (#2939)

* examples : reduce initial memory to 512MB

This commit reduces the initial memory size to 512MB. This is done to
to avoid WebAssembly memory allocation issues on some platforms. It also
adds a flag to allow the memory to grow dynamically (up to the maximum).

The motivation for this change is that currently the initial memory is
set to 2GB which might be to large for some platforms. This will lead to
an error being thrown from the JavaScript code generated by Emscripten
when trying to allocate memory. More details can be found in the
referenced issue below.

* examples : set MAXIMUM_MEMORY instead of TOTAL_MEMORY

This commit sets MAXIMUM_MEMORY instead of TOTAL_MEMORY in the
whisper.wasm example.

The motivation for this is that TOTAL_MEMORY and INITIAL_MEMORY are
actually the same thing. Instead we want to set MAXIMUM_MEMORY to
2GB.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2920
Refs: https://emscripten.org/docs/tools_reference/settings_reference.html#initial-memory

commit | commitdiff | tree

Daniel Bevenius [Mon, 24 Mar 2025 13:40:00 +0000 (14:40 +0100)]

examples : fix nthread parsing in whisper.wasm (#2938)

This commit fixes the nthread parsing in the whisper.wasm example when
using the `Threads` slider to change the number of threads to be used.

Currently this results in the following error:
```console
main.js:5597 Uncaught TypeError: Cannot convert "5" to int
    at checkAssertions (main.js:5597:21)
    at Object.toWireType (main.js:5611:15)
    at Object.full_default (eval at new_ (main.js:5292:27), <anonymous>:10:26)
    at whisper.wasm/:649:42
```

commit | commitdiff | tree

Daniel Bevenius [Mon, 24 Mar 2025 13:33:45 +0000 (14:33 +0100)]

examples : fix request path for local worker files (#2937)

This commit adds a fix to the server.py file to handle requests for
web worker files when running the local python server to test the wasm
examples.

The motivation for this is that currently the server is serving files
from the build-em/bin directory which is where the .worker.js files
exist. But when examples access these resources they do so with the
application context path, for example /whisper.wasm/libmain.worker.js
but this will not be found as it currently works.

commit | commitdiff | tree

Daniel Bevenius [Mon, 24 Mar 2025 08:53:38 +0000 (09:53 +0100)]

ggml : add logging for native build options/vars (#2935)

This commit adds debug level logging for the native build options and
variables to ggml/CMakeLists.txt.

The motivation for this is that it can be useful to see the effective
result of `GGML_NATIVE`, `GGML_NATIVE_DEFAULT`, and `INS_ENB` for a
cmake build. I've found myself adding similar logging a few times now,
so I thought it might be a good idea to add this.

Example output, specifying `-DCMAKE_MESSAGE_LOG_LEVEL=DEBUG` when
running cmake produces the following output:
```console
-- GGML_NATIVE : OFF
-- GGML_NATIVE_DEFAULT : OFF
-- INS_ENB : OFF
```

commit | commitdiff | tree

Peter [Mon, 24 Mar 2025 08:39:50 +0000 (19:39 +1100)]

whisper : enhance model download scripts functionality and resolve compiler warning (#2925)

* whisper : improve whisper-cli executable path detection in model download shell scripts

If whisper-cli is found on the path, do not suggest invoking from build directory. This improves flexibility and usability for distribution and packaging scenarios.

* whisper : enhance Windows model download batch script to have comparable functionality and behaviour as shell scripts

* Download models to the current directory if the script is executed from the \bin\ directory (for future distribution scenarios where the script is in the \bin\ subdirectory of a Windows build)
* Add model_path command line argument
* If whisper-cli is found on the path, do not suggest invoking from build directory

* whisper : resolve compiler warning by removing duplicate definition of NOMINMAX in whisper-cli code

commit | commitdiff | tree

Daniel Bevenius [Mon, 24 Mar 2025 08:36:07 +0000 (09:36 +0100)]

whisper : initialize decoder's rng with unique seed (#2932)

This change initializes each decoder's random number generator with a
unique seed.

The motivation for this is that currently all decoders are initialized
with the same seed value, 0. The result of this is that for the same
state (logits, probs, and logprobs) they will produce the same output.

commit | commitdiff | tree

Daniel Bevenius [Sat, 22 Mar 2025 14:40:28 +0000 (15:40 +0100)]

ci : remove CMAKE_CUDA_ARCHITECTURES in windows-cublas (#2923)

This commit removes the -DCMAKE_CUDA_ARCHITECTURES=all flag from the
windows-cublas job in the build.yml file.

The motivation for this is that building for all architectures is
unnecessary and takes a long time. Without this flag the architectures
will instead be set by ggml-cuda.

Refs: https://github.com/ggerganov/whisper.cpp/pull/2915#issuecomment-2743160743

commit | commitdiff | tree

Peter [Sat, 22 Mar 2025 14:27:57 +0000 (01:27 +1100)]

whisper : update default model download directory behavior to use current working directory when script is in /bin/ directory (#2924)

This change ensures that when the script is packaged and distributed, models are downloaded to the current directory instead of the script's location, preventing conflicts with system directories. This improves flexibility and usability for distribution and packaging scenarios.

commit | commitdiff | tree

Daniel Bevenius [Fri, 21 Mar 2025 10:38:32 +0000 (11:38 +0100)]

whisper.swiftui : Add Core ML support to README [no ci] (#2921)

This commit updates the README to include instructions on how to use
a Core ML model with the example.

commit | commitdiff | tree

Daniel Bevenius [Fri, 21 Mar 2025 09:31:55 +0000 (10:31 +0100)]

readme : update Python version to 3.11 for Core ML support [no -ci] (#2919)

This commit updates the recommended version of Python to 3.11 for Core
ML conversion support. It also adds the `-e` flag to the
`generate-coreml-model.sh` script to ensure that the script exits on the
first error.

The motivation for this that when following the installation instructions
using Python 3.10 I get the following error:
```console
(venv) $ ./models/generate-coreml-model.sh base.en

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/whisper-work/models/convert-whisper-to-coreml.py", line 2, in <module>
    import torch
  File "/whisper-work/venv/lib/python3.10/site-packages/torch/__init__.py", line 870, in <module>
    from . import _masked
  File "/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py", line 420, in <module>
    def sum(input: Tensor,
  File "/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py", line 223, in _apply_docstring_templates
    example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
/whisper-work/venv/lib/python3.10/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at  /Users/distiller/project/pytorch/torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
Minimum required torch version for importing coremltools.optimize.torch is 2.1.0. Got torch version 1.11.0.
Traceback (most recent call last):
  File "/whisper-work/models/convert-whisper-to-coreml.py", line 4, in <module>
    import coremltools as ct
  File "/whisper-work/venv/lib/python3.10/site-packages/coremltools/__init__.py", line 120, in <module>
    from . import converters, models, optimize, proto
  File "/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/__init__.py", line 7, in <module>
    from . import libsvm, sklearn, xgboost
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/__init__.py", line 6, in <module>
    from ._tree import convert
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/_tree.py", line 9, in <module>
    from ._tree_ensemble import convert_tree_ensemble as _convert_tree_ensemble
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/xgboost/_tree_ensemble.py", line 11, in <module>
    from ...models.tree_ensemble import TreeEnsembleClassifier
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/__init__.py", line 6, in <module>
    from . import (
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/ml_program/__init__.py", line 6, in <module>
    from . import compression_utils
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/models/ml_program/compression_utils.py", line 8, in <module>
    from coremltools.converters.mil.mil import Operation as _Operation
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/__init__.py", line 7, in <module>
    from .frontend.tensorflow.tf_op_registry import register_tf_op
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/__init__.py", line 6, in <module>
    from . import tensorflow, tensorflow2, torch
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/__init__.py", line 11, in <module>
    from . import ops, quantization_ops
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 36, in <module>
    from .internal_graph import InternalTorchIRGraph, InternalTorchIRNode
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/internal_graph.py", line 15, in <module>
    from .exir_utils import extract_io_from_exir_program
  File "/Users/danbev/work/ai/whisper-work/venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/exir_utils.py", line 99, in <module>
    ) -> Dict[str, torch.fx.Node]:
AttributeError: module 'torch' has no attribute 'fx'
```
Using Python3.11 the conversion script runs without any errors.

commit | commitdiff | tree

Daniel Bevenius [Fri, 21 Mar 2025 08:53:26 +0000 (09:53 +0100)]

whisper : add check for CPU backend initialization (#2918)

This commit adds a check for the CPU backend initialization in the
whisper library. If the initialization fails, an exception is thrown.

The motivation for this change is to make the library more robust and
handle the case when the CPU backend initialization fails.

Resolves: https://github.com/ggerganov/whisper.cpp/issues/2917

commit | commitdiff | tree

Daniel Bevenius [Fri, 21 Mar 2025 08:52:53 +0000 (09:52 +0100)]

examples : update whisper.objc README.md (#2916)

This commit updates the hisper.objc README.md to reflect the changes of
using the xcframework and the new build process.

Since whisper.cpp is no longer compiled by the example project, instead
the library from the xframework will be used, the build instructions
have been removed.

commit | commitdiff | tree

Daniel Bevenius [Fri, 21 Mar 2025 07:19:24 +0000 (08:19 +0100)]

ci : increase windows-cublas evict-old-files to 5d (#2915)

This commit updates the evict-old-files parameter for the windows-cublas
build job to 5 days.

The motivation for this change is to avoid the full rebuild which takes
around 1.5 hours for the windows-cublas build job. Considering that
there are periods of low traffic on whisper.cpp (like weekends etc.) it
might be better to have a longer eviction policy to avoid the full
rebuild.

commit | commitdiff | tree

Daniel Bevenius [Thu, 20 Mar 2025 17:39:08 +0000 (18:39 +0100)]

xcframework : add support for CoreML to ios/macOS (#2912)

* xcframework : add support for CoreML to ios/macOS

This commit add support for compiling whisper with CoreML support for
iOS and macOS.

The motivation for this change is it will allow users to use a Core ML
model or fall back to a ggml model if Core ML is not available.

With the updated xcframework, I was able to run the whisper.objc example
and successfully load a Core ML model:
```console
whisper_init_state: loading Core ML model from '/Users/danbev/Library/Developer/CoreSimulator/Devices/25E8C27D-0253-4281-AF17-C3F2A4D1D8F4/data/Containers/Bundle/Application/B81F6FF0-BF1A-40DF-AC2A-3908EC4BCC9A/whisper.objc.app/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
```

* squash! xcframework : add support for CoreML to ios/macOS

Fix grammar in output message.

commit | commitdiff | tree

Daniel Bevenius [Thu, 20 Mar 2025 17:36:02 +0000 (18:36 +0100)]

examples : add WHISPER_SDL2 check to deprecation executables (#2911)

This commit adds a check for `WHISPER_SDL2` to the deprecation warning
examples. This is to prevent the examples from being built when
WHISPER_SDL2 is not enabled.

The motivation for this is that currently these deprecation executables
are generate and when run they refer the user to examples with other
names, for example `whisper-command` but unless they have built with
`WHISPER_SDL2` those executable will not be present:
```console
$ ls build/bin/
bench command main quantize stream whisper-bench whisper-cli
whisper-server

$ ./build/bin/command

WARNING: The binary 'command' is deprecated.
Please use 'whisper-command' instead.
See https://github.com/ggerganov/whisper.cpp/tree/master/examples/deprecation-warning/README.md for more information.
```

commit | commitdiff | tree

Daniel Bevenius [Thu, 20 Mar 2025 16:01:48 +0000 (17:01 +0100)]

ci : use ninja and fix caching for windows-cublas (#2910)

This commit updates the windows-cublas job to use Ninja as the build
system instead of msbuild/msvc.

The motivation for this is that msbuild/mscv does not seem to handle
ccache/sccache well, for example it ignores the
`CMAKE_C_COMPILER_LAUNCHER` etc. variables. But using Ninja as the build
caching works and the build is initially the same speed as it is
currently (without caching) subsequently builds are much faster.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2781

Packaging of ggerganov/whisper.cpp

RSS Atom