]> git.djapps.eu Git - pkg/ggml/sources/ggml/log
pkg/ggml/sources/ggml
3 months agoSYCL: using graphs is configurable by environment variable and compile option (llama...
Łukasz Ślusarczyk [Tue, 18 Mar 2025 10:16:31 +0000 (11:16 +0100)]
SYCL: using graphs is configurable by environment variable and compile option (llama/12371)

* alberto changes

* enable sycl graphs by env variable

* fixed compilation warnings in ggml-sycl.cpp

* renamed graph variables

* fix markdown in docs/backend/SYCL.md

Co-authored-by: Romain Biessy <redacted>
* fix markdown in docs/backend/SYCL.md again

* compiling graphs by default, renamed graph_enable to graph_disable

---------

Co-authored-by: Romain Biessy <redacted>
3 months agoggml : add SVE support for q6_K_q8_K (llama/12361)
fj-y-saito [Tue, 18 Mar 2025 08:14:39 +0000 (17:14 +0900)]
ggml : add SVE support for q6_K_q8_K (llama/12361)

3 months agoVulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver...
0cc4m [Tue, 18 Mar 2025 06:21:40 +0000 (07:21 +0100)]
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (llama/12434)

3 months agofixed compilation warnings in ggml-sycl (llama/12424)
Łukasz Ślusarczyk [Tue, 18 Mar 2025 00:51:25 +0000 (01:51 +0100)]
fixed compilation warnings in ggml-sycl (llama/12424)

3 months agollama: Add support for RWKV v7 architecture (llama/12412)
Molly Sophia [Mon, 17 Mar 2025 23:27:50 +0000 (07:27 +0800)]
llama: Add support for RWKV v7 architecture (llama/12412)

* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <redacted>
* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <redacted>
* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <redacted>
* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <redacted>
* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <redacted>
* Apply code-format changes

Signed-off-by: Molly Sophia <redacted>
* fix MUSA build

Signed-off-by: Molly Sophia <redacted>
* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <redacted>
---------

Signed-off-by: Molly Sophia <redacted>
3 months agocuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)
Gaurav Garg [Mon, 17 Mar 2025 18:25:13 +0000 (23:55 +0530)]
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394)

* Enable CUDA Graph on CTK < 12.x

`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.

* Fix compilation errors with MUSA

* Disable CUDA Graph for MUSA

3 months agoggml-vulkan: remove unused find_program(glslc) (llama/12416)
Guus Waals [Mon, 17 Mar 2025 16:35:43 +0000 (00:35 +0800)]
ggml-vulkan: remove unused find_program(glslc) (llama/12416)

It's already found by FindVulkan.cmake in the parent CMakeLists

3 months agovulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)
Jeff Bolz [Mon, 17 Mar 2025 14:26:18 +0000 (09:26 -0500)]
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312)

3 months agovulkan: subgroup size tuning (llama/12087)
Daniele [Mon, 17 Mar 2025 11:42:33 +0000 (12:42 +0100)]
vulkan: subgroup size tuning (llama/12087)

* vulkan: subgroup size test

* Vulkan: Add device architecture enum and logic to recognize AMD generations

* vulkan: use new architecture logic to specify subgroup size

* Initial vulkan subgroup size tuning for RDNA3

* vulkan: commonize RDNA subgroup tuning

* vulkan: override subgroup size if required_subgroup_size = 0

* vulkan: disable warp 32 for RDNA3

* vulkan: fine tuned RDNA1 subgroup sizes

* vulkan: adjusted subgroup size map

* vulkan: fixed RDNA2 subgroup map

---------

Co-authored-by: 0cc4m <redacted>
3 months agovulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)
Jeff Bolz [Mon, 17 Mar 2025 09:43:35 +0000 (04:43 -0500)]
vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309)

3 months agovulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking ...
Jeff Bolz [Mon, 17 Mar 2025 09:41:59 +0000 (04:41 -0500)]
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (llama/12273)

* vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking

3 months agovulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)
Jeff Bolz [Mon, 17 Mar 2025 09:35:00 +0000 (04:35 -0500)]
vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258)

3 months agocmake : enable building llama.cpp using system libggml (llama/12321)
Christian Kastner [Mon, 17 Mar 2025 09:05:23 +0000 (10:05 +0100)]
cmake : enable building llama.cpp using system libggml (llama/12321)

* cmake: Factor out compiler flag function from ggml

llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).

* cmake: Enable building against system ggml

This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.

3 months agoSYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)
Akarshan Biswas [Mon, 17 Mar 2025 01:45:12 +0000 (07:15 +0530)]
SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366)

* SYCL: set extras only on GGML_TYPE_Q4_0

* release tensor_extras in reset buffer interface

3 months agoSYCL: Delete redundant plus sign and space (llama/12391)
aubreyli [Sat, 15 Mar 2025 14:49:03 +0000 (22:49 +0800)]
SYCL: Delete redundant plus sign and space (llama/12391)

3 months agoSYCL : support non-contiguous tensors in binary ops (add, sub, etc) (llama/12399)
fairydreaming [Sat, 15 Mar 2025 14:19:30 +0000 (15:19 +0100)]
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (llama/12399)

* sycl : support non-contiguous tensors in binary ops

* sycl : silence unused variable warning

---------

Co-authored-by: Stanisław Szymczyk <redacted>
3 months agoMUL_MAT optimization (llama/12382)
Chenguang Li [Sat, 15 Mar 2025 01:31:08 +0000 (09:31 +0800)]
MUL_MAT optimization (llama/12382)

3 months agosycl : variable sg_size support for mmvq kernels (llama/12336)
Alberto Cabrera Pérez [Wed, 12 Mar 2025 09:57:32 +0000 (09:57 +0000)]
sycl : variable sg_size support for mmvq kernels (llama/12336)

3 months agoCUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)
uvos [Wed, 12 Mar 2025 09:14:11 +0000 (10:14 +0100)]
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315)

When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64

3 months agovulkan: fix bug in coopmat1 mul_mat_id (llama/12316)
Jeff Bolz [Wed, 12 Mar 2025 05:59:19 +0000 (00:59 -0500)]
vulkan: fix bug in coopmat1 mul_mat_id (llama/12316)

* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id

3 months agoCUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block betwee...
uvos [Tue, 11 Mar 2025 19:16:03 +0000 (20:16 +0100)]
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177)

refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.

---------

Co-authored-by: Johannes Gäßler <redacted>
3 months agoggml-backend : fix backend search path (llama/12330)
jklincn [Tue, 11 Mar 2025 13:25:17 +0000 (21:25 +0800)]
ggml-backend : fix backend search path (llama/12330)

* Fix backend search path

* replace .native() with '/'

* reverted .native()

3 months agometal : Cache the Metal library at the device context level (llama/12265)
BB-fat [Tue, 11 Mar 2025 11:45:02 +0000 (19:45 +0800)]
metal : Cache the Metal library at the device context level (llama/12265)

3 months agomat vec double buffer (llama/12188)
Eve [Mon, 10 Mar 2025 19:28:11 +0000 (19:28 +0000)]
mat vec double buffer (llama/12188)

3 months agomusa: support new arch mp_31 and update doc (llama/12296)
R0CKSTAR [Mon, 10 Mar 2025 17:18:25 +0000 (01:18 +0800)]
musa: support new arch mp_31 and update doc (llama/12296)

Signed-off-by: Xiaodong Ye <redacted>
3 months agoopencl: use OpenCL C standard supported by the device (llama/12221)
Henry Linjamäki [Mon, 10 Mar 2025 16:57:00 +0000 (18:57 +0200)]
opencl: use OpenCL C standard supported by the device (llama/12221)

This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.

3 months agotests : fix test-quantize-fns to init the CPU backend (llama/12306)
Georgi Gerganov [Mon, 10 Mar 2025 12:07:15 +0000 (14:07 +0200)]
tests : fix test-quantize-fns to init the CPU backend (llama/12306)

ggml-ci

3 months agoggml-backend : make path_str compatible with C++20 (llama/12269)
Jason C.H [Sat, 8 Mar 2025 16:02:39 +0000 (00:02 +0800)]
ggml-backend : make path_str compatible with C++20 (llama/12269)

3 months agoggml : skip intermediate .air file when compiling .metallib (llama/12247)
Daniel Bevenius [Fri, 7 Mar 2025 13:15:27 +0000 (14:15 +0100)]
ggml : skip intermediate .air file when compiling .metallib (llama/12247)

This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.

The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.

3 months agoci: disable test-opt for now (#1158)
Akarshan Biswas [Wed, 26 Mar 2025 07:51:18 +0000 (13:21 +0530)]
ci: disable test-opt for now (#1158)

* ci: disable test-opt for now

* Use CTEXT_EXTRA to disable tests

3 months agoci: Initial SYCL setup (#1157)
Akarshan Biswas [Tue, 25 Mar 2025 09:38:14 +0000 (15:08 +0530)]
ci: Initial SYCL setup (#1157)

3 months agoCreate CONTRIBUTING.md (#1146)
cmdr2 [Thu, 13 Mar 2025 18:29:48 +0000 (23:59 +0530)]
Create CONTRIBUTING.md (#1146)

* Create CONTRIBUTING.md

* Update CONTRIBUTING.md

3 months agogpt-2 : add comment about KV cache type (#1142)
bssrdf [Thu, 13 Mar 2025 18:29:19 +0000 (14:29 -0400)]
gpt-2 : add comment about KV cache type (#1142)

* change KV cache to fp16 to take advantage of tensor cores

* added a note/comment to indicate kv can be FP16

3 months agocmake: Enable specifying exact PowerPC CPU architecture (#1138)
Christian Kastner [Mon, 10 Mar 2025 18:19:58 +0000 (19:19 +0100)]
cmake: Enable specifying exact PowerPC CPU architecture (#1138)

In the process, guard automatic CPU detection with GGML_NATIVE.

https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#index-mcpu-10

3 months agocmake: Comment out GGML_BIN_DIR for now (#1139)
Christian Kastner [Mon, 10 Mar 2025 12:06:21 +0000 (13:06 +0100)]
cmake: Comment out GGML_BIN_DIR for now (#1139)

Nothing installs to it yet, so when attempting to use the cmake package,
set_and_check() triggers an error if the directory doesn't already exist
for other reasons.

3 months agospm : remove (#1135)
Georgi Gerganov [Sat, 8 Mar 2025 13:18:24 +0000 (15:18 +0200)]
spm : remove (#1135)

ggml-ci

3 months agosync : whisper.cpp upstream/0.0.1802
Georgi Gerganov [Sat, 8 Mar 2025 13:14:03 +0000 (15:14 +0200)]
sync : whisper.cpp

ggml-ci

3 months agocommon : fix audio loading by miniaudio (whisper/2862)
Dmitry Atamanov [Tue, 4 Mar 2025 17:05:21 +0000 (22:05 +0500)]
common : fix audio loading by miniaudio (whisper/2862)

3 months agosync : llama.cpp
Georgi Gerganov [Fri, 7 Mar 2025 12:50:30 +0000 (14:50 +0200)]
sync : llama.cpp

ggml-ci

3 months agoggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)
Rémy O [Fri, 7 Mar 2025 11:54:22 +0000 (12:54 +0100)]
ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)

3 months agometal : simplify kernel arguments using a struct (#3229) (llama/12194)
BB-fat [Fri, 7 Mar 2025 07:35:57 +0000 (15:35 +0800)]
metal : simplify kernel arguments using a struct (#3229) (llama/12194)

* metal : refactor im2col parameters into a struct

* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets

* metal : refactor sum_rows parameters into a struct

* metal : refactor soft_max parameters into a struct

* metal : refactor diag_mask_inf parameters into a struct

* metal : refactor ssm_conv parameters into a struct

* metal : refactor ssm_scan parameters into a struct

* metal : refactor get_rows parameters into a struct

* metal : refactor group_norm parameters into a struct

* metal : refactor conv_transpose_1d parameters into a struct

* metal : refactor upscale parameters into a struct

* metal : refactor pad parameters into a struct

* metal : refactor pad_reflect_1d parameters into a struct

* metal : refactor arange parameters into a struct

* metal : refactor timestep_embedding parameters into a struct

* metal : refactor argsort parameters into a struct

* metal : refactor leaky_relu parameters into a struct

* metal : refactor pool_2d parameters into a struct

* metal : fix trailing whitespace

---------

Co-authored-by: alexju <redacted>
3 months agometal : fix default.metallib build (llama/12224)
Daniel Bevenius [Fri, 7 Mar 2025 05:23:16 +0000 (06:23 +0100)]
metal : fix default.metallib build (llama/12224)

This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.

The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'.  Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```

With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
         ^~~~~~~~~~~~~~~~~~
         "ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.

3 months agoopencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217)
lhez [Fri, 7 Mar 2025 00:20:35 +0000 (16:20 -0800)]
opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217)

* opencl: support noncontiguous `norm`

* opencl: support noncontiguous `rms_norm`

* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`

3 months agocmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama...
xiaofei [Thu, 6 Mar 2025 22:58:25 +0000 (06:58 +0800)]
cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama/12094)

Signed-off-by: Ray Lee <redacted>
Co-authored-by: Ray Lee <redacted>
3 months agoCUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)
Johannes Gäßler [Thu, 6 Mar 2025 17:45:09 +0000 (18:45 +0100)]
CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)

3 months agoHIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it...
uvos [Thu, 6 Mar 2025 07:20:52 +0000 (08:20 +0100)]
HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)

This avoids conflict with internal cuda/hip runtimes memory managment behavior.

3 months agoopencl : fix buffer alignment (llama/12197)
Henry Linjamäki [Thu, 6 Mar 2025 01:33:40 +0000 (03:33 +0200)]
opencl : fix buffer alignment (llama/12197)

Fix the following error:

```
ggml-alloc.c:99: not enough space in the buffer
ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024)
```

which occurs when `ggml_backend_opencl_context::alignment` is larger
than `cl_ptr_base` (hard-coded to `0x1000`).

Also, fix `ggml_backend_opencl_context::alignment` was set to
`CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the
value is reported in bits.

3 months agoopencl : fix `ulong` kernel args were set from `int` variables (llama/12174)
Henry Linjamäki [Thu, 6 Mar 2025 01:31:14 +0000 (03:31 +0200)]
opencl : fix `ulong` kernel args were set from `int` variables (llama/12174)

... which left garbage bits in the upper half of the kernel args. This
caused segmentation faults when running PoCL.

3 months agoopencl : fix profile-related errors (llama/12095)
simon886212 [Thu, 6 Mar 2025 01:30:05 +0000 (09:30 +0800)]
opencl : fix profile-related errors (llama/12095)

Co-authored-by: ubuntu <redacted>
3 months agoggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154)
Rémy O [Thu, 6 Mar 2025 01:26:10 +0000 (02:26 +0100)]
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154)

* ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions

* cmake: Add GGML_BMI2 build option

* ggml: enable BMI2 on relevant CPU variants

* ggml-cpu: include BMI2 in backend score

* ggml-cpu: register BMI2 in ggml_backend_cpu_get_features

* ggml-cpu: add __BMI2__ define when using MSVC

3 months agoSYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)
Akarshan Biswas [Wed, 5 Mar 2025 15:58:23 +0000 (21:28 +0530)]
SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)

3 months agoggml : fix GGMLMetalClass ODR (llama/12200)
Plamen Minev [Wed, 5 Mar 2025 15:16:01 +0000 (17:16 +0200)]
ggml : fix GGMLMetalClass ODR (llama/12200)

-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.

3 months agoggml : ggml_compute_forward_concat() for arbitrary tensor type (#1118)
vmobilis [Fri, 7 Mar 2025 08:11:40 +0000 (11:11 +0300)]
ggml : ggml_compute_forward_concat() for arbitrary tensor type (#1118)

* ggml_compute_forward_concat() for arbitrary tensor type

* Check that tensors' type match

* ggml-cpu.c: check type of source tensors

* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()

* ggml.c: check concatenated tensor type

* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c

..., as it was moved to ggml.c.

3 months agopkg-config: Use CMake install paths for lib, include (#1133)
Christian Kastner [Thu, 6 Mar 2025 19:01:02 +0000 (20:01 +0100)]
pkg-config: Use CMake install paths for lib, include (#1133)

3 months agovulkan : sync (llama/0)
Georgi Gerganov [Tue, 4 Mar 2025 19:08:15 +0000 (21:08 +0200)]
vulkan : sync (llama/0)

ggml-ci

3 months agosync : llama.cpp
Georgi Gerganov [Tue, 4 Mar 2025 19:07:04 +0000 (21:07 +0200)]
sync : llama.cpp

3 months agoggml : portability fixes for VS 2017 (llama/12150)
mgroeber9110 [Tue, 4 Mar 2025 16:53:26 +0000 (17:53 +0100)]
ggml : portability fixes for VS 2017 (llama/12150)

* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <redacted>
3 months agoHIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)
David Huang [Mon, 3 Mar 2025 21:10:54 +0000 (05:10 +0800)]
HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)

Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check
Adds rocWMMA support to fattn-wmma-f16

3 months agotest-backend-ops : add option -p to filter by op params (llama/12155)
Diego Devesa [Mon, 3 Mar 2025 13:00:46 +0000 (14:00 +0100)]
test-backend-ops : add option -p to filter by op params (llama/12155)

3 months agoggml : fix kleidiai build (llama/12159)
ag2s20150909 [Mon, 3 Mar 2025 12:54:08 +0000 (20:54 +0800)]
ggml : fix kleidiai build (llama/12159)

The libggml API has changed, but this has not been updated.

3 months agoSYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133)
Akarshan Biswas [Mon, 3 Mar 2025 10:07:22 +0000 (15:37 +0530)]
SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133)

* SYCL: refactor and move cpy kernels to a separate file

* Add few missing cpy kernels

* refactor and add debug logs

3 months agoggml-backend : keep paths in native string type when possible (llama/12144)
Diego Devesa [Sun, 2 Mar 2025 21:11:00 +0000 (22:11 +0100)]
ggml-backend : keep paths in native string type when possible (llama/12144)

3 months agoCUDA: compress mode option and default to size (llama/12029)
Erik Scholz [Sat, 1 Mar 2025 11:57:22 +0000 (12:57 +0100)]
CUDA: compress mode option and default to size (llama/12029)

cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".

3 months agoggml : upgrade init_tensor API to return a ggml_status (llama/11854)
William Tambellini [Fri, 28 Feb 2025 13:41:47 +0000 (05:41 -0800)]
ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

* Upgrade init_tensor API to return a ggml_status

To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.

* misc fixes

---------

Co-authored-by: slaren <redacted>
3 months agovulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)
Rémy O [Fri, 28 Feb 2025 08:42:52 +0000 (09:42 +0100)]
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)

* vulkan: implement specialized MMV kernels for IQ2 quantizations

* vulkan: add MMV kernels for IQ3 quants

* vulkan: Increase MMV batch size and unroll IQ LUT setup

* vulkan: fix init_iq_shmem for WG sizes larger than tables

* vulkan: common batch size for all I-quants

3 months agoCUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)
Johannes Gäßler [Fri, 28 Feb 2025 08:26:43 +0000 (09:26 +0100)]
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)

3 months agoggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)
Prashant Vithule [Fri, 28 Feb 2025 07:36:12 +0000 (13:06 +0530)]
ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)

* Added SVE Support for Q2_K Quantized Models

* Use 4-space indentation in the switch cases

* removed comments lines

* Remove the loop Retain the curly bracess for better understanding of code

* Remove the comment like added for q3_k_q8_k kernel

---------

Co-authored-by: vithulep <redacted>
3 months agoCANN: Fix build error with GCC 13 (llama/11990)
hipudding [Fri, 28 Feb 2025 07:23:47 +0000 (15:23 +0800)]
CANN: Fix build error with GCC 13 (llama/11990)

Remove unused header file that causes compilation failure on ARM
platform with GCC 13.

3 months agovulkan: matmul dequantization improvements (llama/12015)
Eve [Fri, 28 Feb 2025 07:20:08 +0000 (07:20 +0000)]
vulkan: matmul dequantization improvements (llama/12015)

* faster dequant for old quants

* dont use unpack for iq4_nl

* vec2 unpack for q8

3 months agovulkan: improve im2col (llama/11826)
Daniele [Fri, 28 Feb 2025 06:52:51 +0000 (06:52 +0000)]
vulkan: improve im2col (llama/11826)

* vulkan: improve im2col performance

3 months agocmake: Fix ggml backend dependencies and installation (llama/11818)
Vladimir Vuksanovic [Thu, 27 Feb 2025 07:42:48 +0000 (08:42 +0100)]
cmake: Fix ggml backend dependencies and installation (llama/11818)

* Fix dependencies between ggml and backends

ggml backends link only to ggml-base and ggml links to all backends.

* Fix installation of ggml backends

Set up GNUInstallDirs before setting the installation directory of ggml backends

3 months agovulkan: fix assertion when qy_needs_dequant (llama/12068)
Jeff Bolz [Tue, 25 Feb 2025 15:30:21 +0000 (09:30 -0600)]
vulkan: fix assertion when qy_needs_dequant (llama/12068)

Looks like a copy/paste bug from qx_needs_dequant.

3 months agoggml-cpu: Fix build with sve (llama/12059)
Molly Sophia [Tue, 25 Feb 2025 11:28:22 +0000 (19:28 +0800)]
ggml-cpu: Fix build with sve (llama/12059)

* ggml-cpu: Fix build with sve

Signed-off-by: Molly Sophia <redacted>
* ggml-cpu: Remove unused variable in sve q3_k vec dot

Signed-off-by: Molly Sophia <redacted>
---------

Signed-off-by: Molly Sophia <redacted>
3 months agocuda: unary ops as float + de-duplicate (#1130)
cmdr2 [Mon, 3 Mar 2025 15:21:31 +0000 (20:51 +0530)]
cuda: unary ops as float + de-duplicate (#1130)

3 months agocuda/vulkan: specify fp32-only support for some operations in supports_op (#1129)
cmdr2 [Fri, 28 Feb 2025 10:29:55 +0000 (15:59 +0530)]
cuda/vulkan: specify fp32-only support for some operations in supports_op (#1129)

* cuda: restrict SILU_BACK to fp32, since fp16 exceeds the desired test threshold

* vulkan: specify fp32-only support for certain ops (that are now tested for fp16 as well)

* f32 sigmoid in vulkan supports op

* Revert "f32 sigmoid in vulkan supports op"

This reverts commit c6f04b3c19bf4504c2776149c6d8cd84e0b48acb.

3 months agocuda/cpu: Increase support for fp16 unary operations (#1125)
cmdr2 [Fri, 28 Feb 2025 07:04:39 +0000 (12:34 +0530)]
cuda/cpu: Increase support for fp16 unary operations (#1125)

* Support fp16 unary operations in the CUDA backend

* cpu: increase fp16 support for unary operators in the CPU backend

* cuda: increase fp16 support for unary operators in the CUDA backend

* Add test cases for fp16 unary operators

* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing

* metal: fix PR comments for unary op support after fp16 unary tests

4 months agosync : whisper.cpp
Georgi Gerganov [Thu, 27 Feb 2025 12:43:20 +0000 (14:43 +0200)]
sync : whisper.cpp

ggml-ci

4 months agowhisper : support GGML_BACKEND_DL (whisper/2843)
Diego Devesa [Thu, 27 Feb 2025 12:35:07 +0000 (13:35 +0100)]
whisper : support GGML_BACKEND_DL (whisper/2843)

* whisper : support GGML_BACKEND_DL

* fix DTW crash

* whisper.objc : fix build - add ggml-cpp.h

---------

Co-authored-by: Georgi Gerganov <redacted>
4 months agoci : fix workflow name
Georgi Gerganov [Thu, 27 Feb 2025 11:11:33 +0000 (13:11 +0200)]
ci : fix workflow name

4 months agoexamples : remove dr_wab.h (#1127)
Georgi Gerganov [Thu, 27 Feb 2025 10:53:37 +0000 (12:53 +0200)]
examples : remove dr_wab.h (#1127)

ggml-ci

4 months agosync : whisper.cpp
Georgi Gerganov [Thu, 27 Feb 2025 10:52:45 +0000 (12:52 +0200)]
sync : whisper.cpp

4 months agocommon : separate whisper sources (whisper/2846)
Georgi Gerganov [Thu, 27 Feb 2025 10:50:32 +0000 (12:50 +0200)]
common : separate whisper sources (whisper/2846)

* common : separate whisper sources

* examples : add chrono

* examples : add more headers

4 months agocommon : fix build min/max (whisper/2845)
Georgi Gerganov [Thu, 27 Feb 2025 08:39:13 +0000 (10:39 +0200)]
common : fix build min/max (whisper/2845)

* common : try to fix build

* cont : try another fix

4 months agoexamples : use miniaudio for direct decoding flac, mp3, ogg and wav (whisper/2759)
Dmitry Atamanov [Thu, 27 Feb 2025 07:06:54 +0000 (12:06 +0500)]
examples : use miniaudio for direct decoding flac, mp3, ogg and wav (whisper/2759)

4 months agocmake : fix compile assumptions for power9/etc (whisper/2777)
midnight [Wed, 5 Feb 2025 12:41:10 +0000 (04:41 -0800)]
cmake : fix compile assumptions for power9/etc (whisper/2777)

* Add small comment re: VSX to readme

Co-authored-by: midnight <redacted>
4 months agoTold cmake to install ggml-cpp.h as a public header file. (#1126)
petterreinholdtsen [Wed, 26 Feb 2025 20:44:00 +0000 (21:44 +0100)]
Told cmake to install ggml-cpp.h as a public header file. (#1126)

It is used by Whisper talk-llama example.

Co-authored-by: Petter Reinholdtsen <redacted>
4 months agoSupport pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (#1121)
cmdr2 [Tue, 25 Feb 2025 12:36:34 +0000 (18:06 +0530)]
Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (#1121)

* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend

* Add fp16 support for add/sub/mul/div on the CPU backend

* Add test cases for fp16 add/sub/mul/div

4 months agosync : llama.cpp
Georgi Gerganov [Tue, 25 Feb 2025 09:44:48 +0000 (11:44 +0200)]
sync : llama.cpp

ggml-ci

4 months agometal : copy kernels for quant to F32/F16 conversions (llama/12017)
Gian-Carlo Pascutto [Tue, 25 Feb 2025 09:27:58 +0000 (10:27 +0100)]
metal : copy kernels for quant to F32/F16 conversions (llama/12017)

metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <redacted>
4 months agoopencl: fix for small models (llama/11950)
lhez [Mon, 24 Feb 2025 21:47:07 +0000 (13:47 -0800)]
opencl: fix for small models (llama/11950)

* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <redacted>
Co-authored-by: Skyler Szot <redacted>
4 months agoOptimize mul_mat for Q4_0 on Intel GPU (llama/12035)
Neo Zhang Jianyu [Mon, 24 Feb 2025 14:33:23 +0000 (22:33 +0800)]
Optimize mul_mat for Q4_0 on Intel GPU (llama/12035)

* opt performance by reorder for Intel GPU

* detect hw type and save opt feature, and print opt feature

* correct name

* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed

* add env variable GGML_SYCL_DISABLE_OPT for debug

* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT

* add performance data

* mv getrows functions to separeted files

* fix global variables

---------

Co-authored-by: arthw <redacted>
4 months agoSYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)
Akarshan Biswas [Mon, 24 Feb 2025 10:18:25 +0000 (15:48 +0530)]
SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)

4 months agoggml-cpu: Support s390x SIMD Instruction Set (llama/12019)
Aaron Teo [Sat, 22 Feb 2025 21:39:24 +0000 (05:39 +0800)]
ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)

* ggml: add s390x ARCH_FLAGS for compilation

Signed-off-by: Aaron Teo <redacted>
* ggml: add SIMD for s390x using vector intrinsics

SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16

SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)

Signed-off-by: Aaron Teo <redacted>
* ggml: fix missing escape character in GGML_F32x4_REDUCE

Signed-off-by: Aaron Teo <redacted>
* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR

Signed-off-by: Aaron Teo <redacted>
* ggml: fix s390x GGML_F32x4_REDUCE

Signed-off-by: Aaron Teo <redacted>
* ggml: full SIMD activation for F32,F16 s390x

Signed-off-by: Aaron Teo <redacted>
* ggml: add option to disable s390x VXE/VXE2

Signed-off-by: Aaron Teo <redacted>
* ggml: change vecintrin.h include to ggml-cpu-impl

* add __VXE__ and __VXE2__ macros

Signed-off-by: Aaron Teo <redacted>
* cmake: add s390x target detection for VX/VXE/VXE2

Signed-off-by: Aaron Teo <redacted>
* ggml: move s390x vector intrinsics to ggml-cpu-impl.h

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x Q8_0 SIMD

Signed-off-by: Aaron Teo <redacted>
* ggml: correct documentation for Q8_0

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x reduce code complexity Q8_0

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x bugfix typo Q8_0

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activated for Q4_1

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x inline vec_reve

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q4_0

Signed-off-by: Aaron Teo <redacted>
* ggml: add VXE backend feature

Signed-off-by: Aaron Teo <redacted>
* ggml: remove test.py

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for quantize_row_q8_0

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for quantize_row_q8_1

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for iq4_xs

Signed-off-by: Aaron Teo <redacted>
* ggml: bugfix iq4_xs

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for iq4_nl

Signed-off-by: Aaron Teo <redacted>
* ggml: add float, double, and long vector data type

Signed-off-by: Aaron Teo <redacted>
* ggml: clean up iq4_xs SIMD

Signed-off-by: Aaron Teo <redacted>
* ggml: fix improper use of restrict keyword

Signed-off-by: Aaron Teo <redacted>
* ggml: update warning message for ggml_vec_tbl

Signed-off-by: Aaron Teo <redacted>
* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K

Signed-off-by: Aaron Teo <redacted>
* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs

Signed-off-by: Aaron Teo <redacted>
* ggml: switch to restrict for iq4_nl

Signed-off-by: Aaron Teo <redacted>
* ggml: slight dot product speed improvement for q4_1_q8_1

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for q6_K

Signed-off-by: Aaron Teo <redacted>
* ggml: add missing `_t` to ggml_int8x16x4_t

Signed-off-by: Aaron Teo <redacted>
* ggml: fix missing `_t` for ggml_vec_xl_s8x4

Signed-off-by: Aaron Teo <redacted>
* ggml: fix more missing `_t`

Signed-off-by: Aaron Teo <redacted>
* ggml: add unroll and prefetch to Q8_0

increase of 3.86% for prompt processing and 32.22% for token generation

Signed-off-by: Aaron Teo <redacted>
* ggml: patch Q8_0 to use proper vector sizes

Signed-off-by: Aaron Teo <redacted>
* ggml: optimise Q8_0 dot prod compute kernel further

Signed-off-by: Aaron Teo <redacted>
* ggml: add unroll and prefetch to Q4_1

Signed-off-by: Aaron Teo <redacted>
* ggml: refactor Q6_K variable naming for readability

Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q6_K typos

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q5_K

Signed-off-by: Aaron Teo <redacted>
* ggml: fix wrong char*x16_t naming

Signed-off-by: Aaron Teo <redacted>
* ggml: Q5_K y0 wrong signness

Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q5_K invalid uchar type

Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q5_K invalid uchar type

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q4_K

Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q4_K invalid vector intrinsics

Signed-off-by: Aaron Teo <redacted>
* ggml: simplify ggml_padd_s16 compute kernel

Signed-off-by: Aaron Teo <redacted>
* ggml: correct ggml-cpu vxe wording

Signed-off-by: Aaron Teo <redacted>
* ggml: change ggml_aligned_malloc alignment to 256

256 is the cache line size for s390x platforms

Signed-off-by: Aaron Teo <redacted>
* ggml: resolve pr merge via cherry-pick 225bbbf

Signed-off-by: Aaron Teo <redacted>
* ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)

* ggml: resolve pr merge via cherry-pick 4571953

Signed-off-by: Aaron Teo <redacted>
* ggml: cmake remove fork when determining s390x machine type

thank you @ericcurtin

Signed-off-by: Aaron Teo <redacted>
---------

Signed-off-by: Aaron Teo <redacted>
Co-authored-by: Jinyang He <redacted>
Co-authored-by: junchao-zhao <redacted>
4 months agoCUDA: app option to compile without FlashAttention (llama/12025)
Johannes Gäßler [Sat, 22 Feb 2025 19:44:34 +0000 (20:44 +0100)]
CUDA: app option to compile without FlashAttention (llama/12025)

4 months agoCUDA: optimize FA for GQA + large batches (llama/12014)
Johannes Gäßler [Sat, 22 Feb 2025 11:20:17 +0000 (12:20 +0100)]
CUDA: optimize FA for GQA + large batches (llama/12014)

4 months agocuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)
Gian-Carlo Pascutto [Sat, 22 Feb 2025 08:43:24 +0000 (09:43 +0100)]
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)

4 months agoCUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)
PureJourney [Fri, 21 Feb 2025 11:21:05 +0000 (19:21 +0800)]
CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)

* CUDA: correct the lowest Maxwell supported by CUDA 12

---------

Co-authored-by: Johannes Gäßler <redacted>
4 months agoMUSA: support ARM64 and enable dp4a .etc (llama/11843)
Bodhi [Fri, 21 Feb 2025 07:46:23 +0000 (15:46 +0800)]
MUSA: support ARM64 and enable dp4a .etc (llama/11843)

* MUSA:  support ARM64 and enable __dp4a .etc

* fix cross entropy loss op for musa

* update

* add cc info log for musa

* add comment for the MUSA .cc calculation block

---------

Co-authored-by: Bodhi Hu <redacted>
4 months agoggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)
Charles Xu [Thu, 20 Feb 2025 13:06:51 +0000 (14:06 +0100)]
ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)

* ggml-cpu: Add CPU backend support for KleidiAI library

* Add environmental variable GGML_KLEIDIAI_SME

* Add support for multithread LHS conversion

* Switch kernel selection order to dotprod and i8mm

* updates for review comments

* More updates for review comments

* Reorganize and rename KleidiAI files

* Move ggml-cpu-traits.h to source file

* Update cmake for SME build and add alignment for SME

* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list

4 months agoggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)
Prashant Vithule [Thu, 20 Feb 2025 10:08:32 +0000 (15:38 +0530)]
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)

* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file

* Improved Formating of code in  ggml-cpu-quants.c file

* style : minor fixes

* style : less whitespaces

* style : ptr spaceing

---------

Co-authored-by: vithulep <redacted>
Co-authored-by: Georgi Gerganov <redacted>