]> git.djapps.eu Git - pkg/ggml/sources/ggml/log
pkg/ggml/sources/ggml
2 months agosync : llama.cpp
Georgi Gerganov [Mon, 31 Mar 2025 10:45:54 +0000 (13:45 +0300)]
sync : llama.cpp

ggml-ci

2 months agoSYCL: Remove misleading ggml_sycl_op_flatten function (llama/12387)
Akarshan Biswas [Mon, 31 Mar 2025 09:25:24 +0000 (14:55 +0530)]
SYCL: Remove misleading ggml_sycl_op_flatten function (llama/12387)

* SYCL: Remove misleading ggml_sycl_op_flatten function

* remove trailing whitespace

* Fix L2 norm from rebase

* remove try catch block from element_wise.cpp

* remove comment from common.hp

* ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward

* norm.cpp: remove try catch exception block

2 months agometal : use constexpr in FA kernels + fix typedef (llama/12659)
Georgi Gerganov [Sun, 30 Mar 2025 19:04:04 +0000 (22:04 +0300)]
metal : use constexpr in FA kernels + fix typedef (llama/12659)

* metal : use constexpr in FA kernels

ggml-ci

* cont

ggml-ci

* cont : fix typedef

ggml-ci

2 months agomusa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc...
R0CKSTAR [Sun, 30 Mar 2025 08:59:38 +0000 (16:59 +0800)]
musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611)

* musa: fix all warnings

Signed-off-by: Xiaodong Ye <redacted>
* musa: enable -DLLAMA_FATAL_WARNINGS=ON in run.sh

Signed-off-by: Xiaodong Ye <redacted>
* musa: update ci doc (install ccache)

Signed-off-by: Xiaodong Ye <redacted>
* fix Windows build issue

Signed-off-by: Xiaodong Ye <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
2 months agocmake : fix ccache conflict (llama/12522)
Jay [Sat, 29 Mar 2025 10:04:58 +0000 (18:04 +0800)]
cmake : fix ccache conflict (llama/12522)

If users already set CMAKE_C_COMPILER_LAUNCHER globally, setting it in
cmake again will lead to conflict and compile fail.

Signed-off-by: Jay <redacted>
3 months agocpu : rm unused variable (#1166)
Xuan-Son Nguyen [Sat, 29 Mar 2025 10:59:56 +0000 (11:59 +0100)]
cpu : rm unused variable (#1166)

3 months agocpu: de-duplicate some of the operators and refactor (#1144)
cmdr2 [Sat, 29 Mar 2025 06:07:13 +0000 (11:37 +0530)]
cpu: de-duplicate some of the operators and refactor (#1144)

* cpu: de-duplicate some of the operators and refactor

* Fix PR comments

* Fix PR comments

3 months agosync : whisper.cpp
Georgi Gerganov [Fri, 28 Mar 2025 19:48:21 +0000 (21:48 +0200)]
sync : whisper.cpp

ggml-ci

3 months agoggml : add logging for native build options/vars (whisper/2935)
Daniel Bevenius [Mon, 24 Mar 2025 08:53:38 +0000 (09:53 +0100)]
ggml : add logging for native build options/vars (whisper/2935)

This commit adds debug level logging for the native build options and
variables to ggml/CMakeLists.txt.

The motivation for this is that it can be useful to see the effective
result of `GGML_NATIVE`, `GGML_NATIVE_DEFAULT`, and `INS_ENB` for a
cmake build. I've found myself adding similar logging a few times now,
so I thought it might be a good idea to add this.

Example output, specifying `-DCMAKE_MESSAGE_LOG_LEVEL=DEBUG` when
running cmake produces the following output:
```console
-- GGML_NATIVE         : OFF
-- GGML_NATIVE_DEFAULT : OFF
-- INS_ENB             : OFF
```

3 months agoexamples : command.wasm updates (whisper/2904)
Daniel Bevenius [Thu, 20 Mar 2025 06:02:18 +0000 (07:02 +0100)]
examples : command.wasm updates (whisper/2904)

This commit updates the command.wasm example by adding a server.py script to make it easy to start a local http server to try out the example, updates the build instructions, and also addresses some of the compiler warnings that were being generated.

* emscripten : fix TOTAL_STACK for wasm

This commit moves the TOTAL_STACK setting from the compile flags to the
linker flags. This is because the TOTAL_STACK setting is a linker
setting.

The motivation for this change is that currently the following warnings
are generated when building:
```console
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'TOTAL_STACK' [-Wunused-command-line-argument]
```

* examples : suppress C++17 deprecation warning for std::codecvt_utf8

This commit suppresses the C++17 deprecation warning for
std::codecvt_utf8 similar to what is done in
examples/talk-llama/unicode.cpp.

The motivation for this change is to suppress these warnings:
```console
/Users/danbev/work/ai/whisper-work/examples/common.cpp:251:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
  251 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |                               ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/codecvt:193:28: note: 'codecvt_utf8<wchar_t>' has been explicitly marked deprecated here
  193 | class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 codecvt_utf8 : public __codecvt_utf8<_Elem> {
      |                            ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
  723 | #    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
      |                                         ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED'
  688 | #      define _LIBCPP_DEPRECATED __attribute__((__deprecated__))
      |                                                 ^
/Users/danbev/work/ai/whisper-work/examples/common.cpp:251:10: warning: 'wstring_convert<std::codecvt_utf8<wchar_t>>' is deprecated [-Wdeprecated-declarations]
  251 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |          ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/locale:3145:28: note: 'wstring_convert<std::codecvt_utf8<wchar_t>>' has been explicitly marked deprecated here
 3145 | class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 wstring_convert {
      |                            ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
  723 | #    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
      |                                         ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED'
  688 | #      define _LIBCPP_DEPRECATED __attribute__((__deprecated__))
      |                                                 ^
/Users/danbev/work/ai/whisper-work/examples/common.cpp:257:31: warning: 'codecvt_utf8<wchar_t>' is deprecated [-Wdeprecated-declarations]
  257 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |                               ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/codecvt:193:28: note: 'codecvt_utf8<wchar_t>' has been explicitly marked deprecated here
  193 | class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 codecvt_utf8 : public __codecvt_utf8<_Elem> {
      |                            ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
  723 | #    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
      |                                         ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED'
  688 | #      define _LIBCPP_DEPRECATED __attribute__((__deprecated__))
      |                                                 ^
/Users/danbev/work/ai/whisper-work/examples/common.cpp:257:10: warning: 'wstring_convert<std::codecvt_utf8<wchar_t>>' is deprecated [-Wdeprecated-declarations]
  257 |     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
      |          ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/locale:3145:28: note: 'wstring_convert<std::codecvt_utf8<wchar_t>>' has been explicitly marked deprecated here
 3145 | class _LIBCPP_TEMPLATE_VIS _LIBCPP_DEPRECATED_IN_CXX17 wstring_convert {
      |                            ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:723:41: note: expanded from macro '_LIBCPP_DEPRECATED_IN_CXX17'
  723 | #    define _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_DEPRECATED
      |                                         ^
/Users/danbev/work/wasm/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/__config:688:49: note: expanded from macro '_LIBCPP_DEPRECATED'
  688 | #      define _LIBCPP_DEPRECATED __attribute__((__deprecated__))
      |                                                 ^
4 warnings generated.
```

* ggml : suppress double-promotion warning in GGML_F16x4_REDUCE

This commit adds a cast to `ggml_float` in the `GGML_F16x4_REDUCE` macro
to suppress a double-promotion warning.

Currently the following warning is generated when compiling the
command.wasm example:
```console
/whisper-work/src/ggml-cpu/ggml-cpu.c:1592:5: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
 1592 |     GGML_F16_VEC_REDUCE(sumf, sum);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/danbev/work/ai/whisper-work/src/ggml-cpu/ggml-cpu.c:932:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
  932 | #define GGML_F16_VEC_REDUCE         GGML_F16x4_REDUCE
      |                                     ^
/Users/danbev/work/ai/whisper-work/src/ggml-cpu/ggml-cpu.c:920:44: note: expanded from macro 'GGML_F16x4_REDUCE'
  918 |     res = wasm_f32x4_extract_lane(x[0], 0) +       \
      |         ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  919 |           wasm_f32x4_extract_lane(x[0], 1) +       \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  920 |           wasm_f32x4_extract_lane(x[0], 2) +       \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
  921 |           wasm_f32x4_extract_lane(x[0], 3);        \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/whisper-work/src/ggml-cpu/ggml-cpu.c:1640:9: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
 1640 |         GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/danbev/work/ai/whisper-work/src/ggml-cpu/ggml-cpu.c:932:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
  932 | #define GGML_F16_VEC_REDUCE         GGML_F16x4_REDUCE
      |                                     ^
/Users/danbev/work/ai/whisper-work/src/ggml-cpu/ggml-cpu.c:920:44: note: expanded from macro 'GGML_F16x4_REDUCE'
  918 |     res = wasm_f32x4_extract_lane(x[0], 0) +       \
      |         ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  919 |           wasm_f32x4_extract_lane(x[0], 1) +       \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  920 |           wasm_f32x4_extract_lane(x[0], 2) +       \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
  921 |           wasm_f32x4_extract_lane(x[0], 3);        \
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated.
```
wasm_f32x4_extract_lane returns a 32-bit float and this is what the
addition is performed on. But there is an implicit conversion from
32-bit float to 64-bit double when the result is assigned to `res`,
which is of type `ggml_float`. My understanding here is that this is
intentional and adding a cast to `ggml_float` should suppress the
warning.

* emscripten : add -Wno-deprecated to for emscripten

This commit adds -Wno-deprecated to the CMAKE_CXX_FLAGS for emscripten
builds.

The motivation for this is that currently there a number of warnings
generated like the following:
```console
warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
em++: warning: warnings in JS library compilation [-Wjs-compiler]
em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument]
warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
em++: warning: warnings in JS library compilation [-Wjs-compiler]
warning: JS library symbol '$print' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
warning: JS library symbol '$printErr' is deprecated. Please open a bug if you have a continuing need for this symbol [-Wdeprecated]
em++: warning: warnings in JS library compilation [-Wjs-compiler]
em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument]
em++: warning: linker setting ignored during compilation: 'ENVIRONMENT' [-Wunused-command-line-argument]
```

The downside of this is that we might miss other deprecation warnings
in the future so I'm not sure if this is acceptable. But it make the
wasm examples cleaner without the warnings.

* examples : fix tautological-compare warning in stb_vorbis.c [no ci]

This commit applies a fix to address a tautological-compare warning
in stb_vorbis.c.

The motivation for this is that currently the following warning is
generated when compiling the commmand-wasm example:
```console
/Users/danbev/work/ai/whisper-work/examples/stb_vorbis.c:1404:75: warning: pointer comparison always evaluates to false [-Wtautological-compare]
 1404 |       if (f->stream_start + loc >= f->stream_end || f->stream_start + loc < f->stream_start) {
      |                                                                           ^
1 warning generated.
```

This fix was taken from an open pull request on the stb repository
that addreses this issue:
https://github.com/nothings/stb/pull/1746

* squash! examples : update command.wasm instructions [no ci]

This commit adds a Python script to serve the the wasm examples build
in the `build-em` directory. Initially I thought that it would be enough
to start a simple python server but I did not notice that there was an
error in the browser console when I did that:
```console
command.js:1 Uncaught (in promise) DataCloneError: Failed to execute 'postMessage' on 'Worker': SharedArrayBuffer transfer requires self.crossOriginIsolated.
    at command.js:1:1206224
    at new Promise (<anonymous>)
    at loadWasmModuleToWorker (command.js:1:1204981)
    at Array.map (<anonymous>)
    at Object.loadWasmModuleToAllWorkers (command.js:1:1206428)
    at command.js:1:1204318
    at callRuntimeCallbacks (command.js:1:1202062)
    at preRun (command.js:1:6136)
    at run (command.js:1:1294094)
    at removeRunDependency (command.js:1:7046)
```
We need a few CORS headers to be set and in order hopefully make this
easy for users a Python script is added to the examples directory.
This should be able to server all the wasm examples provided they have
been built. command.wasm's README.md is updated to reflect this change.

* examples : remove unused functions

This commit removed the unused functions convert_to_utf8 and
convert_to_wstring from examples/common.cpp.

* Revert "examples : fix tautological-compare warning in stb_vorbis.c [no ci]"

This reverts commit 8e3c47d96141c7675c985562ebdc705e839e338a.

We should not make this change here and instead when the upstream PR is
merged we can sync with it.

Refs: https://github.com/ggerganov/whisper.cpp/issues/2784

3 months agosync : llama.cpp
Georgi Gerganov [Fri, 28 Mar 2025 18:47:29 +0000 (20:47 +0200)]
sync : llama.cpp

ggml-ci

3 months agometal : improve FA + improve MoE (llama/12612)
Georgi Gerganov [Fri, 28 Mar 2025 18:21:59 +0000 (20:21 +0200)]
metal : improve FA + improve MoE (llama/12612)

* ggml : FA with different K, V head sizes (CPU)

ggml-ci

* metal : add FA with HS=192

* metal : extend FA to support different K and V head sizes

ggml-ci

* metal : add FA vector kernels for heads K 192 and V 128

ggml-ci

* ggml : restrict op on other backends to equal head sizes

ggml-ci

* metal : optimize FA-vec kernel

ggml-ci

* metal : FA remove mq registers

* metal : improve MoE mul_mat_id condition

ggml-ci

* metal : fix comments + remove unnecessary addition

ggml-ci

* metal : avoid too much shared memory usage with mul_mat_id

ggml-ci

3 months agovulkan: fix coopmat shader generation when cross-compiling (llama/12272)
Icenowy Zheng [Fri, 28 Mar 2025 17:51:06 +0000 (01:51 +0800)]
vulkan: fix coopmat shader generation when cross-compiling (llama/12272)

* vulkan: fix coopmat shader generation when cross-compiling

Previously the status of coopmat{,2} support isn't passed to the
vulkan-shaders-gen project building on the host, which leads to build
failure because of the cross-compiling code expecting coopmat{,2}
shaders that didn't get generated.

Fix this by passing the coopmat{,2} support status to vulkan-shaders
subproject.

Signed-off-by: Icenowy Zheng <redacted>
* Only call coop-mat shaders once

* Fix whitespace

---------

Signed-off-by: Icenowy Zheng <redacted>
Co-authored-by: bandoti <redacted>
3 months agollamafile : ppc64le GEMV forwarding for FP32. (llama/12594)
amritahs-ibm [Fri, 28 Mar 2025 07:43:22 +0000 (13:13 +0530)]
llamafile : ppc64le GEMV forwarding for FP32. (llama/12594)

This patch enables usage of MMA when one of the
dimensions of the matrix(ie either M or N) is 1. This
is useful in case of token generation where N < 2.

The concept of 'GEMV Forwarding' is used where when one
of the matrix has a single row/column, the elements are
broadcasted, instead of using packing routine to prepack
the matrix elements.

This change results in 5% - 15% improvement in total
speed(ie all tokens/total time), across various batch
sizes. This is in comparision with the corresponding
dot product implementation.

The patch is tested with FP32 models of Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>
3 months agorpc : send hash when tensor data is above some fixed threshold (llama/12496)
Radoslav Gerganov [Fri, 28 Mar 2025 06:18:04 +0000 (08:18 +0200)]
rpc : send hash when tensor data is above some fixed threshold (llama/12496)

* rpc : send hash when tensor data is above some fixed threshold

ref #10095

* rpc : put cache under $HOME/.cache/llama.cpp

* try to fix win32 build

* another try to fix win32 build

* remove llama as dependency

3 months agoopencl: add multi and vision rope, `gelu_quick` and `im2col` (llama/12600)
lhez [Thu, 27 Mar 2025 15:08:08 +0000 (08:08 -0700)]
opencl: add multi and vision rope, `gelu_quick` and `im2col` (llama/12600)

* opencl: add `im2col`

* opencl: add `gelu_quick`

* opencl: add mrope

* opencl: add vision rope

3 months agosync : llama.cpp
Georgi Gerganov [Thu, 27 Mar 2025 08:10:36 +0000 (10:10 +0200)]
sync : llama.cpp

3 months agoscripts : update sync (#1161)
Georgi Gerganov [Thu, 27 Mar 2025 07:41:24 +0000 (09:41 +0200)]
scripts : update sync (#1161)

3 months agofiles : remove old wkv6 sources (#0)
Georgi Gerganov [Thu, 27 Mar 2025 07:15:44 +0000 (09:15 +0200)]
files : remove old wkv6 sources (#0)

ggml-ci

3 months agosync : llama.cpp
Georgi Gerganov [Thu, 27 Mar 2025 07:13:12 +0000 (09:13 +0200)]
sync : llama.cpp

ggml-ci

3 months agoggml : sync/merge cmake,riscv,powerpc, add common.cmake (#0)
Georgi Gerganov [Thu, 27 Mar 2025 07:12:54 +0000 (09:12 +0200)]
ggml : sync/merge cmake,riscv,powerpc, add common.cmake (#0)

3 months agollamafile : ppc64le MMA implementation for Q4_0. (llama/12489)
amritahs-ibm [Thu, 27 Mar 2025 06:51:47 +0000 (12:21 +0530)]
llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <redacted>
3 months agoSYCL: implement memset ggml backend buffer interface (llama/12580)
Akarshan Biswas [Thu, 27 Mar 2025 01:46:00 +0000 (07:16 +0530)]
SYCL: implement memset ggml backend buffer interface (llama/12580)

* SYCL: implement memset ggml backend buffer interface

* use GGML_ABORT macro

* Do not wait for all queues to finish for memset operation

3 months agoHIP: Add support for RDNA4 targets (llama/12372)
Slobodan Josic [Wed, 26 Mar 2025 22:46:30 +0000 (23:46 +0100)]
HIP: Add support for RDNA4 targets (llama/12372)

3 months agometal : refactor mat-vec code (llama/12569)
Georgi Gerganov [Wed, 26 Mar 2025 19:38:38 +0000 (21:38 +0200)]
metal : refactor mat-vec code (llama/12569)

* metal : refactor mat-vec code

ggml-ci

* metal : rename all_sum -> sum_all

ggml-ci

* metal : fix comments [no ci]

* metal : fix nr constant [no ci]

* metal : mv q6_K support nr0 > 1

ggml-ci

* metal : reduce register pressure

ggml-ci

* metal : fix typo [no ci]

* metal : reduce register pressure

ggml-ci

3 months agoggml : fix MUL_MAT_ID repack with Q8_K (llama/12544)
Georgi Gerganov [Wed, 26 Mar 2025 11:02:00 +0000 (13:02 +0200)]
ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544)

* ggml : fix MUL_MAT_ID repack with Q8_K

ggml-ci

* ggml : improve repack templates

ggml-ci

3 months agoggml-cpu : update KleidiAI to v1.5.0 (llama/12568)
Dan Johansson [Tue, 25 Mar 2025 11:10:18 +0000 (12:10 +0100)]
ggml-cpu : update KleidiAI to v1.5.0 (llama/12568)

ggml-cpu : bug fix related to KleidiAI LHS packing

Signed-off-by: Dan Johansson <redacted>
3 months agoSYCL: disable Q4_0 reorder optimization (llama/12560)
Akarshan Biswas [Tue, 25 Mar 2025 10:40:18 +0000 (16:10 +0530)]
SYCL: disable Q4_0 reorder optimization (llama/12560)

ggml-ci

3 months agoopencl: simplify kernel embedding logic in cmakefile (llama/12503)
lhez [Mon, 24 Mar 2025 16:20:47 +0000 (09:20 -0700)]
opencl: simplify kernel embedding logic in cmakefile (llama/12503)

Co-authored-by: Max Krasnyansky <redacted>
3 months agoCUDA: Fix clang warnings (llama/12540)
R0CKSTAR [Mon, 24 Mar 2025 10:28:34 +0000 (18:28 +0800)]
CUDA: Fix clang warnings (llama/12540)

Signed-off-by: Xiaodong Ye <redacted>
3 months agovulkan: fix mul_mat_vec failure in backend tests (llama/12529)
Jeff Bolz [Mon, 24 Mar 2025 06:56:17 +0000 (01:56 -0500)]
vulkan: fix mul_mat_vec failure in backend tests (llama/12529)

The OOB calculation could be wrong if the last iteration was during one of
the unrolled loops. Adjust the unrolling counts to avoid this. Add a couple
new backend tests that hit this failure on NVIDIA GPUs.

3 months agoggml : fix quantized cpy op (llama/12310)
Georgi Gerganov [Sat, 22 Mar 2025 14:23:26 +0000 (16:23 +0200)]
ggml : fix quantized cpy op (llama/12310)

* ggml : fix quantized cpy op

ggml-ci

* tests : add cpy tests for all types

ggml-ci

* tests : add BF16 copy tests

ggml-ci

* tests : fix loop for same-type copy

ggml-ci

* tests : add option to permute the dst tensor

ggml-ci

3 months agomusa: refine compute capability (llama/12493)
R0CKSTAR [Sat, 22 Mar 2025 09:11:37 +0000 (17:11 +0800)]
musa: refine compute capability (llama/12493)

* musa: refine compute capability

Signed-off-by: Xiaodong Ye <redacted>
* Address review comments

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
3 months agovulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)
Jeff Bolz [Sat, 22 Mar 2025 08:40:11 +0000 (03:40 -0500)]
vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505)

* tests: add mul_mat perf/functional tests for p021/nc vulkan shaders

* vulkan: Optimize mul_mat_vec p021 and nc shaders.

These shaders are used in attention calculations, and when the KV cache grows
large they start to dominate the run time. For the nc shader (which is called
with large 'k' dimension), use unrolling and vector loads. For the p021 shader
(which is called with large 'm' and small 'k' dimensions), take advantage of
grouped query attention to reuse loads from the A matrix for the whole group,
and reduce the number of workgroups (too much overhead from tiny dispatches).

Using subgroupAdd in the p021 shader also helps, use that conditionally.

3 months agoVulkan: RTE rounding for cpy to quant (llama/12480)
stduhpf [Fri, 21 Mar 2025 19:34:50 +0000 (20:34 +0100)]
Vulkan: RTE rounding for cpy to quant (llama/12480)

* Vulkan: RTE rounding for cpy to quant

Co-Authored-By: Jeff Bolz <redacted>
* remove trailing whitespace

* avoid duplicating pipeline_cpy_f32_quant

* fix copypasting issue

* remove duplicated code

---------

Co-authored-by: Jeff Bolz <redacted>
3 months agovulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/12472)
Eve [Fri, 21 Mar 2025 19:27:47 +0000 (19:27 +0000)]
vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/12472)

3 months agoFix build on Windows when ccache enabled (#9954) (llama/9976)
蕭澧邦 [Fri, 21 Mar 2025 06:58:47 +0000 (14:58 +0800)]
Fix build on Windows when ccache enabled (#9954) (llama/9976)

* [SYCL] Fix build on Windows when ccache enabled (llama/9954)

* take effect only on windows and force it to icl

---------

Co-authored-by: Romain Biessy <redacted>
3 months agosycl: cleanup oneDNN related code (llama/12097)
Svetlozar Georgiev [Fri, 21 Mar 2025 02:15:56 +0000 (02:15 +0000)]
sycl: cleanup oneDNN related code (llama/12097)

3 months agoggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture...
Srihari-mcw [Thu, 20 Mar 2025 11:35:34 +0000 (17:05 +0530)]
ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (llama/12332)

* Add block interleaving support for Q4_K quantization

* Remove whitespaces and fix CI/CD issues

* Update pointer of bsums from int16_t to const int16_t

* Add vector version of quantize_q8_K_4x8 function

* Update code formatting based on review comments

3 months agoCUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)
Gaurav Garg [Wed, 19 Mar 2025 19:52:06 +0000 (01:22 +0530)]
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)

- Find out active blocks per SM using cudaOccupancyMaxActiveBlocksPerMultiprocessor API. Use this value to determine the optimal parallel_blocks value.
- Prefer vector flash attention kernels over MMA kernel for BS=1

Fixes Issue: #12182
---------

Co-authored-by: Johannes Gäßler <redacted>
3 months agovulkan: optimize iq1 coopmat2 dequant functions (llama/12427)
Jeff Bolz [Wed, 19 Mar 2025 18:56:23 +0000 (13:56 -0500)]
vulkan: optimize iq1 coopmat2 dequant functions (llama/12427)

3 months agoFix visionOS build and add CI (llama/12415)
Guus Waals [Wed, 19 Mar 2025 10:15:23 +0000 (10:15 +0000)]
Fix visionOS build and add CI (llama/12415)

* ci: add visionOS build workflow

Add a new GitHub Actions workflow for building on visionOS with CMake and Xcode.

* ggml: Define _DARWIN_C_SOURCE for visionOS to fix missing u_xxx typedefs

* ci: remove define hacks for u_xxx system types

---------

Co-authored-by: Giovanni Petrantoni <redacted>
3 months agovulkan: Submit once enough matmul work has been recorded (llama/12406)
Jeff Bolz [Wed, 19 Mar 2025 07:26:26 +0000 (02:26 -0500)]
vulkan: Submit once enough matmul work has been recorded (llama/12406)

I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.

3 months agoopencl: improve profiling (llama/12442)
lhez [Tue, 18 Mar 2025 19:54:55 +0000 (12:54 -0700)]
opencl: improve profiling (llama/12442)

* opencl: more profiling timing

* opencl: generate trace for profiling

* opencl: reduce profiling overhead

* Populate profiling timing info at the end rather than after each
  kernel run

* opencl: fix for chrome tracing

3 months agomusa: override warp_size of musa device to 32 (llama/12445)
R0CKSTAR [Tue, 18 Mar 2025 18:28:26 +0000 (02:28 +0800)]
musa: override warp_size of musa device to 32 (llama/12445)

Signed-off-by: Xiaodong Ye <redacted>
3 months agoSYCL: using graphs is configurable by environment variable and compile option (llama...
Łukasz Ślusarczyk [Tue, 18 Mar 2025 10:16:31 +0000 (11:16 +0100)]
SYCL: using graphs is configurable by environment variable and compile option (llama/12371)

* alberto changes

* enable sycl graphs by env variable

* fixed compilation warnings in ggml-sycl.cpp

* renamed graph variables

* fix markdown in docs/backend/SYCL.md

Co-authored-by: Romain Biessy <redacted>
* fix markdown in docs/backend/SYCL.md again

* compiling graphs by default, renamed graph_enable to graph_disable

---------

Co-authored-by: Romain Biessy <redacted>
3 months agoggml : add SVE support for q6_K_q8_K (llama/12361)
fj-y-saito [Tue, 18 Mar 2025 08:14:39 +0000 (17:14 +0900)]
ggml : add SVE support for q6_K_q8_K (llama/12361)

3 months agoVulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver...
0cc4m [Tue, 18 Mar 2025 06:21:40 +0000 (07:21 +0100)]
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (llama/12434)

3 months agofixed compilation warnings in ggml-sycl (llama/12424)
Łukasz Ślusarczyk [Tue, 18 Mar 2025 00:51:25 +0000 (01:51 +0100)]
fixed compilation warnings in ggml-sycl (llama/12424)

3 months agollama: Add support for RWKV v7 architecture (llama/12412)
Molly Sophia [Mon, 17 Mar 2025 23:27:50 +0000 (07:27 +0800)]
llama: Add support for RWKV v7 architecture (llama/12412)

* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <redacted>
* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <redacted>
* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <redacted>
* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <redacted>
* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <redacted>
* Apply code-format changes

Signed-off-by: Molly Sophia <redacted>
* fix MUSA build

Signed-off-by: Molly Sophia <redacted>
* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <redacted>
---------

Signed-off-by: Molly Sophia <redacted>
3 months agocuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
Gaurav Garg [Mon, 17 Mar 2025 18:25:13 +0000 (23:55 +0530)]
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)

* Enable CUDA Graph on CTK < 12.x

`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.

* Fix compilation errors with MUSA

* Disable CUDA Graph for MUSA

3 months agoggml-vulkan: remove unused find_program(glslc) (llama/12416)
Guus Waals [Mon, 17 Mar 2025 16:35:43 +0000 (00:35 +0800)]
ggml-vulkan: remove unused find_program(glslc) (llama/12416)

It's already found by FindVulkan.cmake in the parent CMakeLists

3 months agovulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)
Jeff Bolz [Mon, 17 Mar 2025 14:26:18 +0000 (09:26 -0500)]
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)

3 months agovulkan: subgroup size tuning (llama/12087)
Daniele [Mon, 17 Mar 2025 11:42:33 +0000 (12:42 +0100)]
vulkan: subgroup size tuning (llama/12087)

* vulkan: subgroup size test

* Vulkan: Add device architecture enum and logic to recognize AMD generations

* vulkan: use new architecture logic to specify subgroup size

* Initial vulkan subgroup size tuning for RDNA3

* vulkan: commonize RDNA subgroup tuning

* vulkan: override subgroup size if required_subgroup_size = 0

* vulkan: disable warp 32 for RDNA3

* vulkan: fine tuned RDNA1 subgroup sizes

* vulkan: adjusted subgroup size map

* vulkan: fixed RDNA2 subgroup map

---------

Co-authored-by: 0cc4m <redacted>
3 months agovulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)
Jeff Bolz [Mon, 17 Mar 2025 09:43:35 +0000 (04:43 -0500)]
vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)

3 months agovulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking ...
Jeff Bolz [Mon, 17 Mar 2025 09:41:59 +0000 (04:41 -0500)]
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (llama/12273)

* vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking

3 months agovulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)
Jeff Bolz [Mon, 17 Mar 2025 09:35:00 +0000 (04:35 -0500)]
vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)

3 months agocmake : enable building llama.cpp using system libggml (llama/12321)
Christian Kastner [Mon, 17 Mar 2025 09:05:23 +0000 (10:05 +0100)]
cmake : enable building llama.cpp using system libggml (llama/12321)

* cmake: Factor out compiler flag function from ggml

llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).

* cmake: Enable building against system ggml

This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.

3 months agoSYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)
Akarshan Biswas [Mon, 17 Mar 2025 01:45:12 +0000 (07:15 +0530)]
SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)

* SYCL: set extras only on GGML_TYPE_Q4_0

* release tensor_extras in reset buffer interface

3 months agoSYCL: Delete redundant plus sign and space (llama/12391)
aubreyli [Sat, 15 Mar 2025 14:49:03 +0000 (22:49 +0800)]
SYCL: Delete redundant plus sign and space (llama/12391)

3 months agoSYCL : support non-contiguous tensors in binary ops (add, sub, etc) (llama/12399)
fairydreaming [Sat, 15 Mar 2025 14:19:30 +0000 (15:19 +0100)]
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (llama/12399)

* sycl : support non-contiguous tensors in binary ops

* sycl : silence unused variable warning

---------

Co-authored-by: Stanisław Szymczyk <redacted>
3 months agoMUL_MAT optimization (llama/12382)
Chenguang Li [Sat, 15 Mar 2025 01:31:08 +0000 (09:31 +0800)]
MUL_MAT optimization (llama/12382)

3 months agosycl : variable sg_size support for mmvq kernels (llama/12336)
Alberto Cabrera Pérez [Wed, 12 Mar 2025 09:57:32 +0000 (09:57 +0000)]
sycl : variable sg_size support for mmvq kernels (llama/12336)

3 months agoCUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
uvos [Wed, 12 Mar 2025 09:14:11 +0000 (10:14 +0100)]
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)

When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64

3 months agovulkan: fix bug in coopmat1 mul_mat_id (llama/12316)
Jeff Bolz [Wed, 12 Mar 2025 05:59:19 +0000 (00:59 -0500)]
vulkan: fix bug in coopmat1 mul_mat_id (llama/12316)

* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id

3 months agoCUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block betwee...
uvos [Tue, 11 Mar 2025 19:16:03 +0000 (20:16 +0100)]
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <redacted>
3 months agoggml-backend : fix backend search path (llama/12330)
jklincn [Tue, 11 Mar 2025 13:25:17 +0000 (21:25 +0800)]
ggml-backend : fix backend search path (llama/12330)

* Fix backend search path

* replace .native() with '/'

* reverted .native()

3 months agometal : Cache the Metal library at the device context level (llama/12265)
BB-fat [Tue, 11 Mar 2025 11:45:02 +0000 (19:45 +0800)]
metal : Cache the Metal library at the device context level (llama/12265)

3 months agomat vec double buffer (llama/12188)
Eve [Mon, 10 Mar 2025 19:28:11 +0000 (19:28 +0000)]
mat vec double buffer (llama/12188)

3 months agomusa: support new arch mp_31 and update doc (llama/12296)
R0CKSTAR [Mon, 10 Mar 2025 17:18:25 +0000 (01:18 +0800)]
musa: support new arch mp_31 and update doc (llama/12296)

Signed-off-by: Xiaodong Ye <redacted>
3 months agoopencl: use OpenCL C standard supported by the device (llama/12221)
Henry Linjamäki [Mon, 10 Mar 2025 16:57:00 +0000 (18:57 +0200)]
opencl: use OpenCL C standard supported by the device (llama/12221)

This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.

3 months agotests : fix test-quantize-fns to init the CPU backend (llama/12306)
Georgi Gerganov [Mon, 10 Mar 2025 12:07:15 +0000 (14:07 +0200)]
tests : fix test-quantize-fns to init the CPU backend (llama/12306)

ggml-ci

3 months agoggml-backend : make path_str compatible with C++20 (llama/12269)
Jason C.H [Sat, 8 Mar 2025 16:02:39 +0000 (00:02 +0800)]
ggml-backend : make path_str compatible with C++20 (llama/12269)

3 months agoggml : skip intermediate .air file when compiling .metallib (llama/12247)
Daniel Bevenius [Fri, 7 Mar 2025 13:15:27 +0000 (14:15 +0100)]
ggml : skip intermediate .air file when compiling .metallib (llama/12247)

This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.

The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.

3 months agoci: disable test-opt for now (#1158)
Akarshan Biswas [Wed, 26 Mar 2025 07:51:18 +0000 (13:21 +0530)]
ci: disable test-opt for now (#1158)

* ci: disable test-opt for now

* Use CTEXT_EXTRA to disable tests

3 months agoci: Initial SYCL setup (#1157)
Akarshan Biswas [Tue, 25 Mar 2025 09:38:14 +0000 (15:08 +0530)]
ci: Initial SYCL setup (#1157)

3 months agoCreate CONTRIBUTING.md (#1146)
cmdr2 [Thu, 13 Mar 2025 18:29:48 +0000 (23:59 +0530)]
Create CONTRIBUTING.md (#1146)

* Create CONTRIBUTING.md

* Update CONTRIBUTING.md

3 months agogpt-2 : add comment about KV cache type (#1142)
bssrdf [Thu, 13 Mar 2025 18:29:19 +0000 (14:29 -0400)]
gpt-2 : add comment about KV cache type (#1142)

* change KV cache to fp16 to take advantage of tensor cores

* added a note/comment to indicate kv can be FP16

3 months agocmake: Enable specifying exact PowerPC CPU architecture (#1138)
Christian Kastner [Mon, 10 Mar 2025 18:19:58 +0000 (19:19 +0100)]
cmake: Enable specifying exact PowerPC CPU architecture (#1138)

In the process, guard automatic CPU detection with GGML_NATIVE.

https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#index-mcpu-10

3 months agocmake: Comment out GGML_BIN_DIR for now (#1139)
Christian Kastner [Mon, 10 Mar 2025 12:06:21 +0000 (13:06 +0100)]
cmake: Comment out GGML_BIN_DIR for now (#1139)

Nothing installs to it yet, so when attempting to use the cmake package,
set_and_check() triggers an error if the directory doesn't already exist
for other reasons.

3 months agospm : remove (#1135)
Georgi Gerganov [Sat, 8 Mar 2025 13:18:24 +0000 (15:18 +0200)]
spm : remove (#1135)

ggml-ci

3 months agosync : whisper.cpp upstream/0.0.1802
Georgi Gerganov [Sat, 8 Mar 2025 13:14:03 +0000 (15:14 +0200)]
sync : whisper.cpp

ggml-ci

3 months agocommon : fix audio loading by miniaudio (whisper/2862)
Dmitry Atamanov [Tue, 4 Mar 2025 17:05:21 +0000 (22:05 +0500)]
common : fix audio loading by miniaudio (whisper/2862)

3 months agosync : llama.cpp
Georgi Gerganov [Fri, 7 Mar 2025 12:50:30 +0000 (14:50 +0200)]
sync : llama.cpp

ggml-ci

3 months agoggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)
Rémy O [Fri, 7 Mar 2025 11:54:22 +0000 (12:54 +0100)]
ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)

3 months agometal : simplify kernel arguments using a struct (#3229) (llama/12194)
BB-fat [Fri, 7 Mar 2025 07:35:57 +0000 (15:35 +0800)]
metal : simplify kernel arguments using a struct (#3229) (llama/12194)

* metal : refactor im2col parameters into a struct

* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets

* metal : refactor sum_rows parameters into a struct

* metal : refactor soft_max parameters into a struct

* metal : refactor diag_mask_inf parameters into a struct

* metal : refactor ssm_conv parameters into a struct

* metal : refactor ssm_scan parameters into a struct

* metal : refactor get_rows parameters into a struct

* metal : refactor group_norm parameters into a struct

* metal : refactor conv_transpose_1d parameters into a struct

* metal : refactor upscale parameters into a struct

* metal : refactor pad parameters into a struct

* metal : refactor pad_reflect_1d parameters into a struct

* metal : refactor arange parameters into a struct

* metal : refactor timestep_embedding parameters into a struct

* metal : refactor argsort parameters into a struct

* metal : refactor leaky_relu parameters into a struct

* metal : refactor pool_2d parameters into a struct

* metal : fix trailing whitespace

---------

Co-authored-by: alexju <redacted>
3 months agometal : fix default.metallib build (llama/12224)
Daniel Bevenius [Fri, 7 Mar 2025 05:23:16 +0000 (06:23 +0100)]
metal : fix default.metallib build (llama/12224)

This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.

The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'.  Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```

With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
         ^~~~~~~~~~~~~~~~~~
         "ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.

3 months agoopencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217)
lhez [Fri, 7 Mar 2025 00:20:35 +0000 (16:20 -0800)]
opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217)

* opencl: support noncontiguous `norm`

* opencl: support noncontiguous `rms_norm`

* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`

3 months agocmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama...
xiaofei [Thu, 6 Mar 2025 22:58:25 +0000 (06:58 +0800)]
cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama/12094)

Signed-off-by: Ray Lee <redacted>
Co-authored-by: Ray Lee <redacted>
3 months agoCUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)
Johannes Gäßler [Thu, 6 Mar 2025 17:45:09 +0000 (18:45 +0100)]
CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)

3 months agoHIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it...
uvos [Thu, 6 Mar 2025 07:20:52 +0000 (08:20 +0100)]
HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)

This avoids conflict with internal cuda/hip runtimes memory managment behavior.

3 months agoopencl : fix buffer alignment (llama/12197)
Henry Linjamäki [Thu, 6 Mar 2025 01:33:40 +0000 (03:33 +0200)]
opencl : fix buffer alignment (llama/12197)

Fix the following error:

```
ggml-alloc.c:99: not enough space in the buffer
ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024)
```

which occurs when `ggml_backend_opencl_context::alignment` is larger
than `cl_ptr_base` (hard-coded to `0x1000`).

Also, fix `ggml_backend_opencl_context::alignment` was set to
`CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the
value is reported in bits.

3 months agoopencl : fix `ulong` kernel args were set from `int` variables (llama/12174)
Henry Linjamäki [Thu, 6 Mar 2025 01:31:14 +0000 (03:31 +0200)]
opencl : fix `ulong` kernel args were set from `int` variables (llama/12174)

... which left garbage bits in the upper half of the kernel args. This
caused segmentation faults when running PoCL.

3 months agoopencl : fix profile-related errors (llama/12095)
simon886212 [Thu, 6 Mar 2025 01:30:05 +0000 (09:30 +0800)]
opencl : fix profile-related errors (llama/12095)

Co-authored-by: ubuntu <redacted>
3 months agoggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154)
Rémy O [Thu, 6 Mar 2025 01:26:10 +0000 (02:26 +0100)]
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154)

* ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions

* cmake: Add GGML_BMI2 build option

* ggml: enable BMI2 on relevant CPU variants

* ggml-cpu: include BMI2 in backend score

* ggml-cpu: register BMI2 in ggml_backend_cpu_get_features

* ggml-cpu: add __BMI2__ define when using MSVC

3 months agoSYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)
Akarshan Biswas [Wed, 5 Mar 2025 15:58:23 +0000 (21:28 +0530)]
SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)

3 months agoggml : fix GGMLMetalClass ODR (llama/12200)
Plamen Minev [Wed, 5 Mar 2025 15:16:01 +0000 (17:16 +0200)]
ggml : fix GGMLMetalClass ODR (llama/12200)

-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.

3 months agoggml : ggml_compute_forward_concat() for arbitrary tensor type (#1118)
vmobilis [Fri, 7 Mar 2025 08:11:40 +0000 (11:11 +0300)]
ggml : ggml_compute_forward_concat() for arbitrary tensor type (#1118)

* ggml_compute_forward_concat() for arbitrary tensor type

* Check that tensors' type match

* ggml-cpu.c: check type of source tensors

* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()

* ggml.c: check concatenated tensor type

* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c

..., as it was moved to ggml.c.

3 months agopkg-config: Use CMake install paths for lib, include (#1133)
Christian Kastner [Thu, 6 Mar 2025 19:01:02 +0000 (20:01 +0100)]
pkg-config: Use CMake install paths for lib, include (#1133)

3 months agovulkan : sync (llama/0)
Georgi Gerganov [Tue, 4 Mar 2025 19:08:15 +0000 (21:08 +0200)]
vulkan : sync (llama/0)

ggml-ci