git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

overview / pkg / ggml / sources / whisper.cpp / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Daniel Bevenius [Fri, 4 Apr 2025 08:23:53 +0000 (10:23 +0200)]

examples : update server.py to match github pages app [no ci] (#3004)

This commit updates examples/server.py which is used to serve the wasm
examples locally. The changes include:

- Added a redirect from the root URL to /whisper.cpp.
  So now accessing http://localhost:8000/ will redirect to
  http://localhost:8000/whisper.cpp/ which matches the url for the app
  deployed to github pages.

- Custom handling for coi-serviceworker.js to serve it to avoid
  and error in the console. This file is not strictly necessary
  for the local server to work as the headers are provided already but
  it is nice to not have an error in the console.

- Fixed the shutdown of the server to ensure it exits cleanly
  on Ctrl+C. Previously it would continue to hang onto the port even
  after the processed had exited.

commit | commitdiff | tree

Daniel Bevenius [Thu, 3 Apr 2025 17:50:47 +0000 (19:50 +0200)]

whisper.wasm : fix unknown language issue (#3000)

* whisper.wasm : fix unknown language issue

This commit addresses an issue with whisper.wasm where the following
error was being displayed when running the application in github pages:
```
whisper_lang_id: unknown language 'д=␙c'
```

This turned out to be a memory corruption issue and further details
can be found in the reference issue below.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2998

commit | commitdiff | tree

Georgi Gerganov [Wed, 2 Apr 2025 12:24:02 +0000 (15:24 +0300)]

examples : add new sources

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 2 Apr 2025 12:23:55 +0000 (15:23 +0300)]

sync : ggml

commit | commitdiff | tree

cmdr2 [Wed, 2 Apr 2025 12:16:16 +0000 (17:46 +0530)]

cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167)

* cpu: refactor SIMD mappings and vectorized op functions into separate files

* Fix warning for ggml_float to float

* Fix warnings

* cpu: move all the operations (except mul_mat) to a separate c++ file

* fix whitespace

* Update ggml/src/ggml-cpu/vec.h

Co-authored-by: Diego Devesa <redacted>
* Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp

* Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously

---------

Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

Daniel Bevenius [Thu, 3 Apr 2025 07:06:53 +0000 (09:06 +0200)]

docs : add xcframework section to README.md [no ci] (#2997)

This adds a section to the README.md file that describes how to use the
XCFramework.

The modification for this is that is not obvious how to use the
XCFramework and and example will help.
One thing to note is that the example is using the latest release
including the checksum. We are thinking about how we might automate
this in the future but for now this is a good start.

commit | commitdiff | tree

Georgi Gerganov [Wed, 2 Apr 2025 14:38:35 +0000 (17:38 +0300)]

readme : update roadmap link

commit | commitdiff | tree

Georgi Gerganov [Wed, 2 Apr 2025 13:31:22 +0000 (16:31 +0300)]

release : v1.7.5

commit | commitdiff | tree

Georgi Gerganov [Wed, 2 Apr 2025 13:27:36 +0000 (16:27 +0300)]

bench : update numbers [no ci] (#2993)

commit | commitdiff | tree

Georgi Gerganov [Wed, 2 Apr 2025 12:13:40 +0000 (15:13 +0300)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Chenguang Li [Wed, 2 Apr 2025 07:22:13 +0000 (15:22 +0800)]

get_rows and dup optimization (llama/12671)

* [CANN]get_rows and dup optimization.

Co-authored-by: hipudding <redacted>
Signed-off-by: noemotiovon <redacted>
* [CANN]GET_ROWS and CPY/DUP optimization

Co-authored-by: hipudding <redacted>
Signed-off-by: noemotiovon <redacted>
* [CANN]code style adjustment

Signed-off-by: noemotiovon <redacted>
* [CANN]code style adjustment

Signed-off-by: noemotiovon <redacted>
* [CANN]code style adjustment

Signed-off-by: noemotiovon <redacted>
* [CANN]code style adjustment

Signed-off-by: noemotiovon <redacted>
---------

Signed-off-by: noemotiovon <redacted>
Co-authored-by: noemotiovon <redacted>
Co-authored-by: hipudding <redacted>

commit | commitdiff | tree

Junil Kim [Tue, 1 Apr 2025 16:54:34 +0000 (01:54 +0900)]

opencl : fix memory allocation size (llama/12649)

issue:
https://github.com/CodeLinaro/llama.cpp/pull/17#issuecomment-2760611283

This patch fixes the memory allocation size
not exceeding the maximum size of the OpenCL device.

commit | commitdiff | tree

Georgi Gerganov [Tue, 1 Apr 2025 11:57:19 +0000 (14:57 +0300)]

metal : use F32 prec in FA kernels (llama/12688)

* metal : use F32 prec in FA kernels

ggml-ci

* cont : fix FA vec kernel

ggml-ci

commit | commitdiff | tree

R0CKSTAR [Tue, 1 Apr 2025 11:12:53 +0000 (19:12 +0800)]

Fix clang warning in gguf_check_reserved_keys (llama/12686)

* Fix clang warning in gguf_check_reserved_keys

Signed-off-by: Xiaodong Ye <redacted>
* Fix typo

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Wagner Bruna [Tue, 1 Apr 2025 09:38:07 +0000 (06:38 -0300)]

vulkan: fix build when glslc doesn't support coopmat (llama/12683)

commit | commitdiff | tree

Romain Biessy [Tue, 1 Apr 2025 08:24:29 +0000 (10:24 +0200)]

SYCL: Rename oneMKL to oneMath (llama/12192)

* Rename oneMKL Interface to oneMath

* Use oneMath for Intel vendor

* Rename occurences to mkl

* clang-format

* Silence verbose warnings

* Set oneMath HIP_TARGETS

* Fix silence warnings

* Remove step to build oneMath from build instructions

* Use fixed oneMath version

* Remove INTEL_CPU

* Fold CMake oneDNN conditions

* Use Intel oneMKL for Intel devices

* Improve CMake message

* Link against MKL::MKL_SYCL::BLAS only

* Move oneMath documentation to Nvidia and AMD sections

commit | commitdiff | tree

Akarshan Biswas [Tue, 1 Apr 2025 08:11:39 +0000 (13:41 +0530)]

SYCL: switch to SYCL namespace (llama/12674)

commit | commitdiff | tree

a3sh [Mon, 31 Mar 2025 16:05:13 +0000 (00:05 +0800)]

ggml : faster ssm scan (llama/10558)

* faster ssm_scan

* delete unused commnet

* clang format

* add space

* modify unnecessary calculations

* faster ssm conv implementatioin

* modify file name with dash

commit | commitdiff | tree

0cc4m [Mon, 31 Mar 2025 12:37:01 +0000 (14:37 +0200)]

Vulkan: Add DP4A MMQ and Q8_1 quantization shader (llama/12135)

* Vulkan: Add DP4A MMQ and Q8_1 quantization shader

* Add q4_0 x q8_1 matrix matrix multiplication support

* Vulkan: Add int8 coopmat MMQ support

* Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code

* Add GL_EXT_integer_dot_product check

* Remove ggml changes, fix mmq pipeline picker

* Remove ggml changes, restore Intel coopmat behaviour

* Fix glsl compile attempt when integer vec dot is not supported

* Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq

* Remove redundant comment

* Fix integer dot check

* Fix compile issue with unsupported int dot glslc

* Update Windows build Vulkan SDK version

commit | commitdiff | tree

Georgi Gerganov [Mon, 31 Mar 2025 12:05:30 +0000 (15:05 +0300)]

cmake : fix whitespace (llama/0)

commit | commitdiff | tree

Daniel Bevenius [Wed, 2 Apr 2025 08:50:31 +0000 (10:50 +0200)]

tests : remove gh label test-whisper-cli-tiny-en (#2988)

This commit removes test-whisper-cli-tiny-en from the gh label.

The motivation for this change is that until recently the tests were
disabled. But now that they are enabled some of the tests, specifically
the ci jobs that use sanatizers (e.g. thread-sanitizer) take a long time
to run as they are instrumented.
Some of these jobs also have matricies which means that there are
multiple jobs are created that all run these tests.
The suggestion here is to limit the number of tests that are run in the
ci jobs so cut down the CI build time.

commit | commitdiff | tree

Daniel Bevenius [Wed, 2 Apr 2025 06:32:14 +0000 (08:32 +0200)]

examples : clarify Core ML encoder model usage [no ci] (#2987)

This commit clarifies the usage of the Core ML encoder model in the
whisper.obj and whisper.swiftui examples.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2783

commit | commitdiff | tree

Daniel Bevenius [Wed, 2 Apr 2025 06:29:28 +0000 (08:29 +0200)]

ci : remove intermediate build on push to master (#2986)

This commit removes the builds that happen on each push to master.

Refs: https://github.com/ggerganov/whisper.cpp/discussions/2983#discussioncomment-12691424

commit | commitdiff | tree

Daniel Bevenius [Wed, 2 Apr 2025 06:26:57 +0000 (08:26 +0200)]

whisper.objc : fix typo in README.md [no ci] (#2985)

This commit fixes a typo in the README.md file of the whisper.objc
example.

Resolves: https://github.com/ggerganov/whisper.cpp/issues/2984

commit | commitdiff | tree

Daniel Bevenius [Tue, 1 Apr 2025 16:01:23 +0000 (18:01 +0200)]

coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] (#2979)

* coreml: fix Whisper to CoreML conversion by disabling SDPA

This commit disables the use of PyTorch's
`scaled_dot_product_attention` in the Whisper model to avoid
compatibility issues during CoreML conversion.
The issue occurs because coremltools requires PyTorch 2.5.0, but the
Whisper implementation may expect behavior from newer PyTorch versions.

By setting `MultiHeadAttention.use_sdpa = False`, we force Whisper to
use its fallback manual attention implementation, which works correctly
with PyTorch 2.5.0 during the tracing process.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2783

* coreml: fix audio shape in whisper decoder conversion

This commit fixes the audio shape in the whisper decoder conversion
script.

The motivation for this is that the audio shape was incorrect and
was causing the conversion to fail.

* coreml : set -e in generate-coreml-interface.sh

The commit sets the -e flag in the generate-coreml-interface.sh script
to make sure the script fails if any command fails.

* coreml : update generated encoder/decoder interfaces

This commit updates the generated encoder/decoder interfaces for the
whisper model which is the result of running the
generate-coreml-interface.sh script.

commit | commitdiff | tree

Daniel Bevenius [Tue, 1 Apr 2025 15:04:32 +0000 (17:04 +0200)]

ci : add coreml job that converts base.en to coreml [no ci] (#2981)

* ci : add coreml job that converts base.en to coreml [no ci]

This commit adds a new job to the CI pipeline that downloads the base.en
model and converts it to CoreML format. The CoreML model is then packed
into a zip file and uploaded as an artifact.

This will only be done for pushes to master, releases, or pre-releases.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2783

* coreml : remove publishing of coreml model

* ci : add GGML_OPENMP=OFF to ubuntu-22-gcc-sanitized

commit | commitdiff | tree

Daniel Bevenius [Mon, 31 Mar 2025 15:04:37 +0000 (17:04 +0200)]

tests : re-enable tests [no ci] (#2977)

This commit re-enables the tests in the build process which are
currently commented out.

It is possible to build the tests using `-DWHISPER_BUILD_TESTS=ON` and
then run a single test using:
```console
$ ctest -R test-whisper-cli-tiny.en --test-dir build
Internal ctest changing into directory: /home/danbev/work/ai/whisper-work/build
Test project /home/danbev/work/ai/whisper-work/build
    Start 2: test-whisper-cli-tiny.en
1/1 Test #2: test-whisper-cli-tiny.en .........   Passed    4.44 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary:
en      =   4.44 sec*proc (1 test)
gh      =   4.44 sec*proc (1 test)
tiny    =   4.44 sec*proc (1 test)

Total Test time (real) =   4.44 sec
```

Some of the tests take a long time to run so it might not be a good idea
to enable them in CI, or perhaps we could only run a subset of the tests
in CI.

commit | commitdiff | tree

Daniel Bevenius [Mon, 31 Mar 2025 14:14:33 +0000 (16:14 +0200)]

android.java : re-add ggml source updates (#2975)

This commit updates the ggml source to include the new unary and binary
operations. I merged https://github.com/ggerganov/whisper.cpp/pull/2958
which seems to have overwritten the changes to the ggml source which
were added in https://github.com/ggerganov/whisper.cpp/pull/2972.

Sorry about this.

commit | commitdiff | tree

Daniel Bevenius [Mon, 31 Mar 2025 13:24:08 +0000 (15:24 +0200)]

ci : re-enable freeBDS-latest job (#2973)

This commit re-enables the freeBSD-latest job which has been commented
out.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2781

commit | commitdiff | tree

Daniel Bevenius [Mon, 31 Mar 2025 13:14:24 +0000 (15:14 +0200)]

ci : re-enable android_java job (#2958)

This commit re-enables the android_java job in the CI workflow. The job
was disabled because of a failing build.

The motivation for this is that Commit
226d344f565ea6140e7c6a583bc300a64454af58 ("whisper.android.java : update
build with ggml source changes") addressed build issues and it should
now be possible to re-enable this job.

commit | commitdiff | tree

Georgi Gerganov [Mon, 31 Mar 2025 11:38:43 +0000 (14:38 +0300)]

android : add new ggml source files

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Mon, 31 Mar 2025 11:19:25 +0000 (14:19 +0300)]

ruby : add new ggml sources

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Mon, 31 Mar 2025 11:13:54 +0000 (14:13 +0300)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Akarshan Biswas [Mon, 31 Mar 2025 09:25:24 +0000 (14:55 +0530)]

SYCL: Remove misleading ggml_sycl_op_flatten function (llama/12387)

* SYCL: Remove misleading ggml_sycl_op_flatten function

* remove trailing whitespace

* Fix L2 norm from rebase

* remove try catch block from element_wise.cpp

* remove comment from common.hp

* ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward

* norm.cpp: remove try catch exception block

commit | commitdiff | tree

Georgi Gerganov [Sun, 30 Mar 2025 19:04:04 +0000 (22:04 +0300)]

metal : use constexpr in FA kernels + fix typedef (llama/12659)

* metal : use constexpr in FA kernels

ggml-ci

* cont

ggml-ci

* cont : fix typedef

ggml-ci

commit | commitdiff | tree

R0CKSTAR [Sun, 30 Mar 2025 08:59:38 +0000 (16:59 +0800)]

musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611)

* musa: fix all warnings

Signed-off-by: Xiaodong Ye <redacted>
* musa: enable -DLLAMA_FATAL_WARNINGS=ON in run.sh

Signed-off-by: Xiaodong Ye <redacted>
* musa: update ci doc (install ccache)

Signed-off-by: Xiaodong Ye <redacted>
* fix Windows build issue

Signed-off-by: Xiaodong Ye <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Jay [Sat, 29 Mar 2025 10:04:58 +0000 (18:04 +0800)]

cmake : fix ccache conflict (llama/12522)

If users already set CMAKE_C_COMPILER_LAUNCHER globally, setting it in
cmake again will lead to conflict and compile fail.

Signed-off-by: Jay <redacted>

commit | commitdiff | tree

Xuan-Son Nguyen [Sat, 29 Mar 2025 10:59:56 +0000 (11:59 +0100)]

cpu : rm unused variable (ggml/1166)

commit | commitdiff | tree

cmdr2 [Sat, 29 Mar 2025 06:07:13 +0000 (11:37 +0530)]

cpu: de-duplicate some of the operators and refactor (ggml/1144)

* cpu: de-duplicate some of the operators and refactor

* Fix PR comments

* Fix PR comments

commit | commitdiff | tree

Sandro Hanea [Mon, 31 Mar 2025 10:44:36 +0000 (12:44 +0200)]

cmake: improve Vulkan cooperative matrix support checks (#2966)

Co-authored-by: Sandro Hanea <redacted>

commit | commitdiff | tree

Daniel Bevenius [Mon, 31 Mar 2025 10:32:27 +0000 (12:32 +0200)]

examples : update README links to point to pages deployment (#2971)

This commit updates the README links to point to the pages deployment
instead of whisper.ggerganov.com.

commit | commitdiff | tree

Daniel Bevenius [Mon, 31 Mar 2025 09:34:40 +0000 (11:34 +0200)]

ci : add github pages workflow for wasm examples (#2969)

* ci : add github pages workflow for wasm examples

This commit adds a github workflow to build and deploy the wasm examples
to github pages. The whisper.wasm example is deployed as the main page.

This workflow is trigged by a push to master and will deploy the
examples to: https://ggerganov.github.io/whisper.cpp/.

This requires that the repository has enabled github actions in
`Settings` -> `Pages` -> `Build and deployment` -> `Source` be set to
`GitHub Actions`.

One thing to note is that this commit removes the `talk` example as I'm
not sure how this example is built yet.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2784

commit | commitdiff | tree

Sacha Arbonel [Mon, 31 Mar 2025 08:03:41 +0000 (10:03 +0200)]

feat: add health check endpoint to server (#2968)

commit | commitdiff | tree

Daniel Bevenius [Sun, 30 Mar 2025 03:56:10 +0000 (05:56 +0200)]

whisper : remove unnecessary GGML_UNUSED macro (#2960)

commit | commitdiff | tree

Georgi Gerganov [Fri, 28 Mar 2025 18:58:21 +0000 (20:58 +0200)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 28 Mar 2025 18:21:59 +0000 (20:21 +0200)]

metal : improve FA + improve MoE (llama/12612)

* ggml : FA with different K, V head sizes (CPU)

ggml-ci

* metal : add FA with HS=192

* metal : extend FA to support different K and V head sizes

ggml-ci

* metal : add FA vector kernels for heads K 192 and V 128

ggml-ci

* ggml : restrict op on other backends to equal head sizes

ggml-ci

* metal : optimize FA-vec kernel

ggml-ci

* metal : FA remove mq registers

* metal : improve MoE mul_mat_id condition

ggml-ci

* metal : fix comments + remove unnecessary addition

ggml-ci

* metal : avoid too much shared memory usage with mul_mat_id

ggml-ci

commit | commitdiff | tree

Icenowy Zheng [Fri, 28 Mar 2025 17:51:06 +0000 (01:51 +0800)]

vulkan: fix coopmat shader generation when cross-compiling (llama/12272)

* vulkan: fix coopmat shader generation when cross-compiling

Previously the status of coopmat{,2} support isn't passed to the
vulkan-shaders-gen project building on the host, which leads to build
failure because of the cross-compiling code expecting coopmat{,2}
shaders that didn't get generated.

Fix this by passing the coopmat{,2} support status to vulkan-shaders
subproject.

Signed-off-by: Icenowy Zheng <redacted>
* Only call coop-mat shaders once

* Fix whitespace

---------

Signed-off-by: Icenowy Zheng <redacted>
Co-authored-by: bandoti <redacted>

commit | commitdiff | tree

amritahs-ibm [Fri, 28 Mar 2025 07:43:22 +0000 (13:13 +0530)]

llamafile : ppc64le GEMV forwarding for FP32. (llama/12594)

This patch enables usage of MMA when one of the
dimensions of the matrix(ie either M or N) is 1. This
is useful in case of token generation where N < 2.

The concept of 'GEMV Forwarding' is used where when one
of the matrix has a single row/column, the elements are
broadcasted, instead of using packing routine to prepack
the matrix elements.

This change results in 5% - 15% improvement in total
speed(ie all tokens/total time), across various batch
sizes. This is in comparision with the corresponding
dot product implementation.

The patch is tested with FP32 models of Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Fri, 28 Mar 2025 06:18:04 +0000 (08:18 +0200)]

rpc : send hash when tensor data is above some fixed threshold (llama/12496)

* rpc : send hash when tensor data is above some fixed threshold

ref #10095

* rpc : put cache under $HOME/.cache/llama.cpp

* try to fix win32 build

* another try to fix win32 build

* remove llama as dependency

commit | commitdiff | tree

lhez [Thu, 27 Mar 2025 15:08:08 +0000 (08:08 -0700)]

opencl: add multi and vision rope, `gelu_quick` and `im2col` (llama/12600)

* opencl: add `im2col`

* opencl: add `gelu_quick`

* opencl: add mrope

* opencl: add vision rope

commit | commitdiff | tree

Amanda Der Bedrosian [Fri, 28 Mar 2025 11:26:22 +0000 (04:26 -0700)]

bindings.go : add DetectedLanguage to go bindings (#2947)

Adding in DetectedLanguage(), a function to retrieve the detected
language that's populated by processing audio. Also adding in a unit
test to test the success.

Co-authored-by: Amanda Der Bedrosian <redacted>

commit | commitdiff | tree

Daniel Bevenius [Fri, 28 Mar 2025 08:29:56 +0000 (09:29 +0100)]

ruby : fix test failures in test_whisper (#2955)

* bindings.ruby : fix test failures in test_whisper

This commit updates the parallel tests to use 2 processors instead of
the number of processors on the system. It also comments out the setting
of the log callback to an empty lambda as this causes a segfault when
enabled.

The motivation for the change to the number of processors is that if one
has a large number of processors, for example I have 16 on the machine I
used to test this, this would cause the following warning to be printed:
```console
whisper_full_with_state: input is too short - 680 ms < 1000 ms. consider padding the input audio with silence
```

This is logged from:
```c++
int whisper_full_with_state(
        struct whisper_context * ctx,
          struct whisper_state * state,
    struct whisper_full_params   params,
                   const float * samples,
                           int   n_samples) {
   ...
    if (seek_end < seek_start + 100) {
        WHISPER_LOG_WARN("%s: input is too short - %d ms < 1000 ms. consider padding the input audio with silence\n", __func__, (seek_end - seek_start)*10);
        return 0;
    }
```
This will return early and there will be segment callbacks to be invoked
which in turn will cause the tests to fail.

* bindings.ruby : fix warnings in tests

This commit fixes the following warnings in the Ruby tests:
```console
/whisper/bindings/ruby/tests/test_segment.rb:52:
warning: ambiguity between regexp and two divisions:
wrap regexp in parentheses or add a space after `/' operator
```
And also adds a '_' prefix to some unused variables to avoid warnings.

* bindings.ruby : enable Wisper.log_set in tests

The commit reverts the commenting out of the Whisper.log_set call in
the test_whisper.rb tests.

I'm no longer getting segfaults when running the tests with this
which was the case earlier. One theory could be that I rebased this to
include the latest ggml sync to master to make sure things still worked.
With the latest changes in ggml, I can't reproduce the segfaults.

commit | commitdiff | tree

Lin Xiaodong [Fri, 28 Mar 2025 05:34:26 +0000 (13:34 +0800)]

examples : support progress_callback API for addon.node (#2941)

* feat: progress supported

* fix: missing params

* style: Format the code to improve readability

Unified code indentation ensures consistent coding style, enhancing code readability and maintainability.

* feat: support prompt api

---------

Co-authored-by: linxiaodong <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 08:30:09 +0000 (10:30 +0200)]

xcf : fix visionOS build

ref: https://github.com/ggml-org/llama.cpp/pull/12415

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 08:15:02 +0000 (10:15 +0200)]

files : remove old wkv6 (#0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 08:13:47 +0000 (10:13 +0200)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Mar 2025 07:12:54 +0000 (09:12 +0200)]

ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)

commit | commitdiff | tree

amritahs-ibm [Thu, 27 Mar 2025 06:51:47 +0000 (12:21 +0530)]

llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>

commit | commitdiff | tree

Akarshan Biswas [Thu, 27 Mar 2025 01:46:00 +0000 (07:16 +0530)]

SYCL: implement memset ggml backend buffer interface (llama/12580)

* SYCL: implement memset ggml backend buffer interface

* use GGML_ABORT macro

* Do not wait for all queues to finish for memset operation

commit | commitdiff | tree

Slobodan Josic [Wed, 26 Mar 2025 22:46:30 +0000 (23:46 +0100)]

HIP: Add support for RDNA4 targets (llama/12372)

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Mar 2025 19:38:38 +0000 (21:38 +0200)]

metal : refactor mat-vec code (llama/12569)

* metal : refactor mat-vec code

ggml-ci

* metal : rename all_sum -> sum_all

ggml-ci

* metal : fix comments [no ci]

* metal : fix nr constant [no ci]

* metal : mv q6_K support nr0 > 1

ggml-ci

* metal : reduce register pressure

ggml-ci

* metal : fix typo [no ci]

* metal : reduce register pressure

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Mar 2025 11:02:00 +0000 (13:02 +0200)]

ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544)

* ggml : fix MUL_MAT_ID repack with Q8_K

ggml-ci

* ggml : improve repack templates

ggml-ci

commit | commitdiff | tree

Dan Johansson [Tue, 25 Mar 2025 11:10:18 +0000 (12:10 +0100)]

ggml-cpu : update KleidiAI to v1.5.0 (llama/12568)

ggml-cpu : bug fix related to KleidiAI LHS packing

Signed-off-by: Dan Johansson <redacted>

commit | commitdiff | tree

Akarshan Biswas [Tue, 25 Mar 2025 10:40:18 +0000 (16:10 +0530)]

SYCL: disable Q4_0 reorder optimization (llama/12560)

ggml-ci

commit | commitdiff | tree

lhez [Mon, 24 Mar 2025 16:20:47 +0000 (09:20 -0700)]

opencl: simplify kernel embedding logic in cmakefile (llama/12503)

Co-authored-by: Max Krasnyansky <redacted>

commit | commitdiff | tree

R0CKSTAR [Mon, 24 Mar 2025 10:28:34 +0000 (18:28 +0800)]

CUDA: Fix clang warnings (llama/12540)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Jeff Bolz [Mon, 24 Mar 2025 06:56:17 +0000 (01:56 -0500)]

vulkan: fix mul_mat_vec failure in backend tests (llama/12529)

The OOB calculation could be wrong if the last iteration was during one of
the unrolled loops. Adjust the unrolling counts to avoid this. Add a couple
new backend tests that hit this failure on NVIDIA GPUs.

commit | commitdiff | tree

Georgi Gerganov [Sat, 22 Mar 2025 14:23:26 +0000 (16:23 +0200)]

ggml : fix quantized cpy op (llama/12310)

* ggml : fix quantized cpy op

ggml-ci

* tests : add cpy tests for all types

ggml-ci

* tests : add BF16 copy tests

ggml-ci

* tests : fix loop for same-type copy

ggml-ci

* tests : add option to permute the dst tensor

ggml-ci

commit | commitdiff | tree

R0CKSTAR [Sat, 22 Mar 2025 09:11:37 +0000 (17:11 +0800)]

musa: refine compute capability (llama/12493)

* musa: refine compute capability

Signed-off-by: Xiaodong Ye <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Jeff Bolz [Sat, 22 Mar 2025 08:40:11 +0000 (03:40 -0500)]

vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)

* tests: add mul_mat perf/functional tests for p021/nc vulkan shaders

* vulkan: Optimize mul_mat_vec p021 and nc shaders.

These shaders are used in attention calculations, and when the KV cache grows
large they start to dominate the run time. For the nc shader (which is called
with large 'k' dimension), use unrolling and vector loads. For the p021 shader
(which is called with large 'm' and small 'k' dimensions), take advantage of
grouped query attention to reuse loads from the A matrix for the whole group,
and reduce the number of workgroups (too much overhead from tiny dispatches).

Using subgroupAdd in the p021 shader also helps, use that conditionally.

commit | commitdiff | tree

stduhpf [Fri, 21 Mar 2025 19:34:50 +0000 (20:34 +0100)]

Vulkan: RTE rounding for cpy to quant (llama/12480)

* Vulkan: RTE rounding for cpy to quant

Co-Authored-By: Jeff Bolz <redacted>
* remove trailing whitespace

* avoid duplicating pipeline_cpy_f32_quant

* fix copypasting issue

* remove duplicated code

---------

Co-authored-by: Jeff Bolz <redacted>

commit | commitdiff | tree

Eve [Fri, 21 Mar 2025 19:27:47 +0000 (19:27 +0000)]

vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/12472)

commit | commitdiff | tree

蕭澧邦 [Fri, 21 Mar 2025 06:58:47 +0000 (14:58 +0800)]

Fix build on Windows when ccache enabled (ggml/9954) (llama/9976)

* [SYCL] Fix build on Windows when ccache enabled (llama/9954)

* take effect only on windows and force it to icl

---------

Co-authored-by: Romain Biessy <redacted>

commit | commitdiff | tree

Svetlozar Georgiev [Fri, 21 Mar 2025 02:15:56 +0000 (02:15 +0000)]

sycl: cleanup oneDNN related code (llama/12097)

commit | commitdiff | tree

Srihari-mcw [Thu, 20 Mar 2025 11:35:34 +0000 (17:05 +0530)]

ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (llama/12332)

* Add block interleaving support for Q4_K quantization

* Remove whitespaces and fix CI/CD issues

* Update pointer of bsums from int16_t to const int16_t

* Add vector version of quantize_q8_K_4x8 function

* Update code formatting based on review comments

commit | commitdiff | tree

Gaurav Garg [Wed, 19 Mar 2025 19:52:06 +0000 (01:22 +0530)]

CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)

- Find out active blocks per SM using cudaOccupancyMaxActiveBlocksPerMultiprocessor API. Use this value to determine the optimal parallel_blocks value.
- Prefer vector flash attention kernels over MMA kernel for BS=1

Fixes Issue: #12182
---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Jeff Bolz [Wed, 19 Mar 2025 18:56:23 +0000 (13:56 -0500)]

vulkan: optimize iq1 coopmat2 dequant functions (llama/12427)

commit | commitdiff | tree

Guus Waals [Wed, 19 Mar 2025 10:15:23 +0000 (10:15 +0000)]

Fix visionOS build and add CI (llama/12415)

* ci: add visionOS build workflow

Add a new GitHub Actions workflow for building on visionOS with CMake and Xcode.

* ggml: Define _DARWIN_C_SOURCE for visionOS to fix missing u_xxx typedefs

* ci: remove define hacks for u_xxx system types

---------

Co-authored-by: Giovanni Petrantoni <redacted>

commit | commitdiff | tree

Jeff Bolz [Wed, 19 Mar 2025 07:26:26 +0000 (02:26 -0500)]

vulkan: Submit once enough matmul work has been recorded (llama/12406)

I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.

commit | commitdiff | tree

lhez [Tue, 18 Mar 2025 19:54:55 +0000 (12:54 -0700)]

opencl: improve profiling (llama/12442)

* opencl: more profiling timing

* opencl: generate trace for profiling

* opencl: reduce profiling overhead

* Populate profiling timing info at the end rather than after each
kernel run

* opencl: fix for chrome tracing

commit | commitdiff | tree

R0CKSTAR [Tue, 18 Mar 2025 18:28:26 +0000 (02:28 +0800)]

musa: override warp_size of musa device to 32 (llama/12445)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Łukasz Ślusarczyk [Tue, 18 Mar 2025 10:16:31 +0000 (11:16 +0100)]

SYCL: using graphs is configurable by environment variable and compile option (llama/12371)

* alberto changes

* enable sycl graphs by env variable

* fixed compilation warnings in ggml-sycl.cpp

* renamed graph variables

* fix markdown in docs/backend/SYCL.md

Co-authored-by: Romain Biessy <redacted>
* fix markdown in docs/backend/SYCL.md again

* compiling graphs by default, renamed graph_enable to graph_disable

---------

Co-authored-by: Romain Biessy <redacted>

commit | commitdiff | tree

fj-y-saito [Tue, 18 Mar 2025 08:14:39 +0000 (17:14 +0900)]

ggml : add SVE support for q6_K_q8_K (llama/12361)

commit | commitdiff | tree

0cc4m [Tue, 18 Mar 2025 06:21:40 +0000 (07:21 +0100)]

Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (llama/12434)

commit | commitdiff | tree

Łukasz Ślusarczyk [Tue, 18 Mar 2025 00:51:25 +0000 (01:51 +0100)]

fixed compilation warnings in ggml-sycl (llama/12424)

commit | commitdiff | tree

Molly Sophia [Mon, 17 Mar 2025 23:27:50 +0000 (07:27 +0800)]

llama: Add support for RWKV v7 architecture (llama/12412)

* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <redacted>
* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <redacted>
* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <redacted>
* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <redacted>
* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <redacted>
* Apply code-format changes

Signed-off-by: Molly Sophia <redacted>
* fix MUSA build

Signed-off-by: Molly Sophia <redacted>
* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <redacted>
---------

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

Gaurav Garg [Mon, 17 Mar 2025 18:25:13 +0000 (23:55 +0530)]

cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)

* Enable CUDA Graph on CTK < 12.x

`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.

* Fix compilation errors with MUSA

* Disable CUDA Graph for MUSA

commit | commitdiff | tree

Guus Waals [Mon, 17 Mar 2025 16:35:43 +0000 (00:35 +0800)]

ggml-vulkan: remove unused find_program(glslc) (llama/12416)

It's already found by FindVulkan.cmake in the parent CMakeLists

commit | commitdiff | tree

Jeff Bolz [Mon, 17 Mar 2025 14:26:18 +0000 (09:26 -0500)]

vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)

commit | commitdiff | tree

Daniele [Mon, 17 Mar 2025 11:42:33 +0000 (12:42 +0100)]

vulkan: subgroup size tuning (llama/12087)

* vulkan: subgroup size test

* Vulkan: Add device architecture enum and logic to recognize AMD generations

* vulkan: use new architecture logic to specify subgroup size

* Initial vulkan subgroup size tuning for RDNA3

* vulkan: commonize RDNA subgroup tuning

* vulkan: override subgroup size if required_subgroup_size = 0

* vulkan: disable warp 32 for RDNA3

* vulkan: fine tuned RDNA1 subgroup sizes

* vulkan: adjusted subgroup size map

* vulkan: fixed RDNA2 subgroup map

---------

Co-authored-by: 0cc4m <redacted>

commit | commitdiff | tree

Jeff Bolz [Mon, 17 Mar 2025 09:43:35 +0000 (04:43 -0500)]

vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)

commit | commitdiff | tree

Jeff Bolz [Mon, 17 Mar 2025 09:41:59 +0000 (04:41 -0500)]

vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (llama/12273)

* vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking

commit | commitdiff | tree

Jeff Bolz [Mon, 17 Mar 2025 09:35:00 +0000 (04:35 -0500)]

vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)

commit | commitdiff | tree

Christian Kastner [Mon, 17 Mar 2025 09:05:23 +0000 (10:05 +0100)]

cmake : enable building llama.cpp using system libggml (llama/12321)

* cmake: Factor out compiler flag function from ggml

llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).

* cmake: Enable building against system ggml

This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.

commit | commitdiff | tree

Akarshan Biswas [Mon, 17 Mar 2025 01:45:12 +0000 (07:15 +0530)]

SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)

* SYCL: set extras only on GGML_TYPE_Q4_0

* release tensor_extras in reset buffer interface

commit | commitdiff | tree

aubreyli [Sat, 15 Mar 2025 14:49:03 +0000 (22:49 +0800)]

SYCL: Delete redundant plus sign and space (llama/12391)

commit | commitdiff | tree

fairydreaming [Sat, 15 Mar 2025 14:19:30 +0000 (15:19 +0100)]

SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (llama/12399)

* sycl : support non-contiguous tensors in binary ops

* sycl : silence unused variable warning

---------

Co-authored-by: Stanisław Szymczyk <redacted>

commit | commitdiff | tree

Chenguang Li [Sat, 15 Mar 2025 01:31:08 +0000 (09:31 +0800)]

MUL_MAT optimization (llama/12382)

commit | commitdiff | tree

Alberto Cabrera Pérez [Wed, 12 Mar 2025 09:57:32 +0000 (09:57 +0000)]

sycl : variable sg_size support for mmvq kernels (llama/12336)

commit | commitdiff | tree

uvos [Wed, 12 Mar 2025 09:14:11 +0000 (10:14 +0100)]

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)

When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64

Packaging of ggerganov/whisper.cpp