git.djapps.eu Git - pkg/ggml/sources/ggml/log

]> git.djapps.eu Git - pkg/ggml/sources/ggml/log

overview / pkg / ggml / sources / ggml / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Henry Linjamäki [Mon, 10 Mar 2025 16:57:00 +0000 (18:57 +0200)]

opencl: use OpenCL C standard supported by the device (llama/12221)

This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.

commit | commitdiff | tree

Georgi Gerganov [Mon, 10 Mar 2025 12:07:15 +0000 (14:07 +0200)]

tests : fix test-quantize-fns to init the CPU backend (llama/12306)

ggml-ci

commit | commitdiff | tree

Jason C.H [Sat, 8 Mar 2025 16:02:39 +0000 (00:02 +0800)]

ggml-backend : make path_str compatible with C++20 (llama/12269)

commit | commitdiff | tree

Daniel Bevenius [Fri, 7 Mar 2025 13:15:27 +0000 (14:15 +0100)]

ggml : skip intermediate .air file when compiling .metallib (llama/12247)

This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.

The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.

commit | commitdiff | tree

Akarshan Biswas [Wed, 26 Mar 2025 07:51:18 +0000 (13:21 +0530)]

ci: disable test-opt for now (#1158)

* ci: disable test-opt for now

* Use CTEXT_EXTRA to disable tests

commit | commitdiff | tree

Akarshan Biswas [Tue, 25 Mar 2025 09:38:14 +0000 (15:08 +0530)]

ci: Initial SYCL setup (#1157)

commit | commitdiff | tree

cmdr2 [Thu, 13 Mar 2025 18:29:48 +0000 (23:59 +0530)]

Create CONTRIBUTING.md (#1146)

* Create CONTRIBUTING.md

* Update CONTRIBUTING.md

commit | commitdiff | tree

bssrdf [Thu, 13 Mar 2025 18:29:19 +0000 (14:29 -0400)]

gpt-2 : add comment about KV cache type (#1142)

* change KV cache to fp16 to take advantage of tensor cores

* added a note/comment to indicate kv can be FP16

commit | commitdiff | tree

Christian Kastner [Mon, 10 Mar 2025 18:19:58 +0000 (19:19 +0100)]

cmake: Enable specifying exact PowerPC CPU architecture (#1138)

In the process, guard automatic CPU detection with GGML_NATIVE.

https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#index-mcpu-10

commit | commitdiff | tree

Christian Kastner [Mon, 10 Mar 2025 12:06:21 +0000 (13:06 +0100)]

cmake: Comment out GGML_BIN_DIR for now (#1139)

Nothing installs to it yet, so when attempting to use the cmake package,
set_and_check() triggers an error if the directory doesn't already exist
for other reasons.

commit | commitdiff | tree

Georgi Gerganov [Sat, 8 Mar 2025 13:18:24 +0000 (15:18 +0200)]

spm : remove (#1135)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 8 Mar 2025 13:14:03 +0000 (15:14 +0200)]

sync : whisper.cpp

ggml-ci

commit | commitdiff | tree

Dmitry Atamanov [Tue, 4 Mar 2025 17:05:21 +0000 (22:05 +0500)]

common : fix audio loading by miniaudio (whisper/2862)

commit | commitdiff | tree

Georgi Gerganov [Fri, 7 Mar 2025 12:50:30 +0000 (14:50 +0200)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

Rémy O [Fri, 7 Mar 2025 11:54:22 +0000 (12:54 +0100)]

ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)

commit | commitdiff | tree

BB-fat [Fri, 7 Mar 2025 07:35:57 +0000 (15:35 +0800)]

metal : simplify kernel arguments using a struct (#3229) (llama/12194)

* metal : refactor im2col parameters into a struct

* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets

* metal : refactor sum_rows parameters into a struct

* metal : refactor soft_max parameters into a struct

* metal : refactor diag_mask_inf parameters into a struct

* metal : refactor ssm_conv parameters into a struct

* metal : refactor ssm_scan parameters into a struct

* metal : refactor get_rows parameters into a struct

* metal : refactor group_norm parameters into a struct

* metal : refactor conv_transpose_1d parameters into a struct

* metal : refactor upscale parameters into a struct

* metal : refactor pad parameters into a struct

* metal : refactor pad_reflect_1d parameters into a struct

* metal : refactor arange parameters into a struct

* metal : refactor timestep_embedding parameters into a struct

* metal : refactor argsort parameters into a struct

* metal : refactor leaky_relu parameters into a struct

* metal : refactor pool_2d parameters into a struct

* metal : fix trailing whitespace

---------

Co-authored-by: alexju <redacted>

commit | commitdiff | tree

Daniel Bevenius [Fri, 7 Mar 2025 05:23:16 +0000 (06:23 +0100)]

metal : fix default.metallib build (llama/12224)

This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.

The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'.  Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```

With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
         ^~~~~~~~~~~~~~~~~~
         "ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.

commit | commitdiff | tree

lhez [Fri, 7 Mar 2025 00:20:35 +0000 (16:20 -0800)]

opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217)

* opencl: support noncontiguous `norm`

* opencl: support noncontiguous `rms_norm`

* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`

commit | commitdiff | tree

xiaofei [Thu, 6 Mar 2025 22:58:25 +0000 (06:58 +0800)]

cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama/12094)

Signed-off-by: Ray Lee <redacted>
Co-authored-by: Ray Lee <redacted>

commit | commitdiff | tree

Johannes Gäßler [Thu, 6 Mar 2025 17:45:09 +0000 (18:45 +0100)]

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)

commit | commitdiff | tree

uvos [Thu, 6 Mar 2025 07:20:52 +0000 (08:20 +0100)]

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)

This avoids conflict with internal cuda/hip runtimes memory managment behavior.

commit | commitdiff | tree

Henry Linjamäki [Thu, 6 Mar 2025 01:33:40 +0000 (03:33 +0200)]

opencl : fix buffer alignment (llama/12197)

Fix the following error:

```
ggml-alloc.c:99: not enough space in the buffer
ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024)
```

which occurs when `ggml_backend_opencl_context::alignment` is larger
than `cl_ptr_base` (hard-coded to `0x1000`).

Also, fix `ggml_backend_opencl_context::alignment` was set to
`CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the
value is reported in bits.

commit | commitdiff | tree

Henry Linjamäki [Thu, 6 Mar 2025 01:31:14 +0000 (03:31 +0200)]

opencl : fix `ulong` kernel args were set from `int` variables (llama/12174)

... which left garbage bits in the upper half of the kernel args. This
caused segmentation faults when running PoCL.

commit | commitdiff | tree

simon886212 [Thu, 6 Mar 2025 01:30:05 +0000 (09:30 +0800)]

opencl : fix profile-related errors (llama/12095)

Co-authored-by: ubuntu <redacted>

commit | commitdiff | tree

Rémy O [Thu, 6 Mar 2025 01:26:10 +0000 (02:26 +0100)]

ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154)

* ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions

* cmake: Add GGML_BMI2 build option

* ggml: enable BMI2 on relevant CPU variants

* ggml-cpu: include BMI2 in backend score

* ggml-cpu: register BMI2 in ggml_backend_cpu_get_features

* ggml-cpu: add __BMI2__ define when using MSVC

commit | commitdiff | tree

Akarshan Biswas [Wed, 5 Mar 2025 15:58:23 +0000 (21:28 +0530)]

SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)

commit | commitdiff | tree

Plamen Minev [Wed, 5 Mar 2025 15:16:01 +0000 (17:16 +0200)]

ggml : fix GGMLMetalClass ODR (llama/12200)

-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.

commit | commitdiff | tree

vmobilis [Fri, 7 Mar 2025 08:11:40 +0000 (11:11 +0300)]

ggml : ggml_compute_forward_concat() for arbitrary tensor type (#1118)

* ggml_compute_forward_concat() for arbitrary tensor type

* Check that tensors' type match

* ggml-cpu.c: check type of source tensors

* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()

* ggml.c: check concatenated tensor type

* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c

..., as it was moved to ggml.c.

commit | commitdiff | tree

Christian Kastner [Thu, 6 Mar 2025 19:01:02 +0000 (20:01 +0100)]

pkg-config: Use CMake install paths for lib, include (#1133)

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Mar 2025 19:08:15 +0000 (21:08 +0200)]

vulkan : sync (llama/0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Mar 2025 19:07:04 +0000 (21:07 +0200)]

sync : llama.cpp

commit | commitdiff | tree

mgroeber9110 [Tue, 4 Mar 2025 16:53:26 +0000 (17:53 +0100)]

ggml : portability fixes for VS 2017 (llama/12150)

* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <redacted>

commit | commitdiff | tree

David Huang [Mon, 3 Mar 2025 21:10:54 +0000 (05:10 +0800)]

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)

Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check
Adds rocWMMA support to fattn-wmma-f16

commit | commitdiff | tree

Diego Devesa [Mon, 3 Mar 2025 13:00:46 +0000 (14:00 +0100)]

test-backend-ops : add option -p to filter by op params (llama/12155)

commit | commitdiff | tree

ag2s20150909 [Mon, 3 Mar 2025 12:54:08 +0000 (20:54 +0800)]

ggml : fix kleidiai build (llama/12159)

The libggml API has changed, but this has not been updated.

commit | commitdiff | tree

Akarshan Biswas [Mon, 3 Mar 2025 10:07:22 +0000 (15:37 +0530)]

SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133)

* SYCL: refactor and move cpy kernels to a separate file

* Add few missing cpy kernels

* refactor and add debug logs

commit | commitdiff | tree

Diego Devesa [Sun, 2 Mar 2025 21:11:00 +0000 (22:11 +0100)]

ggml-backend : keep paths in native string type when possible (llama/12144)

commit | commitdiff | tree

Erik Scholz [Sat, 1 Mar 2025 11:57:22 +0000 (12:57 +0100)]

CUDA: compress mode option and default to size (llama/12029)

cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".

commit | commitdiff | tree

William Tambellini [Fri, 28 Feb 2025 13:41:47 +0000 (05:41 -0800)]

ggml : upgrade init_tensor API to return a ggml_status (llama/11854)

* Upgrade init_tensor API to return a ggml_status

To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.

* misc fixes

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Rémy O [Fri, 28 Feb 2025 08:42:52 +0000 (09:42 +0100)]

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)

* vulkan: implement specialized MMV kernels for IQ2 quantizations

* vulkan: add MMV kernels for IQ3 quants

* vulkan: Increase MMV batch size and unroll IQ LUT setup

* vulkan: fix init_iq_shmem for WG sizes larger than tables

* vulkan: common batch size for all I-quants

commit | commitdiff | tree

Johannes Gäßler [Fri, 28 Feb 2025 08:26:43 +0000 (09:26 +0100)]

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)

commit | commitdiff | tree

Prashant Vithule [Fri, 28 Feb 2025 07:36:12 +0000 (13:06 +0530)]

ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)

* Added SVE Support for Q2_K Quantized Models

* Use 4-space indentation in the switch cases

* removed comments lines

* Remove the loop Retain the curly bracess for better understanding of code

* Remove the comment like added for q3_k_q8_k kernel

---------

Co-authored-by: vithulep <redacted>

commit | commitdiff | tree

hipudding [Fri, 28 Feb 2025 07:23:47 +0000 (15:23 +0800)]

CANN: Fix build error with GCC 13 (llama/11990)

Remove unused header file that causes compilation failure on ARM
platform with GCC 13.

commit | commitdiff | tree

Eve [Fri, 28 Feb 2025 07:20:08 +0000 (07:20 +0000)]

vulkan: matmul dequantization improvements (llama/12015)

* faster dequant for old quants

* dont use unpack for iq4_nl

* vec2 unpack for q8

commit | commitdiff | tree

Daniele [Fri, 28 Feb 2025 06:52:51 +0000 (06:52 +0000)]

vulkan: improve im2col (llama/11826)

* vulkan: improve im2col performance

commit | commitdiff | tree

Vladimir Vuksanovic [Thu, 27 Feb 2025 07:42:48 +0000 (08:42 +0100)]

cmake: Fix ggml backend dependencies and installation (llama/11818)

* Fix dependencies between ggml and backends

ggml backends link only to ggml-base and ggml links to all backends.

* Fix installation of ggml backends

Set up GNUInstallDirs before setting the installation directory of ggml backends

commit | commitdiff | tree

Jeff Bolz [Tue, 25 Feb 2025 15:30:21 +0000 (09:30 -0600)]

vulkan: fix assertion when qy_needs_dequant (llama/12068)

Looks like a copy/paste bug from qx_needs_dequant.

commit | commitdiff | tree

Molly Sophia [Tue, 25 Feb 2025 11:28:22 +0000 (19:28 +0800)]

ggml-cpu: Fix build with sve (llama/12059)

* ggml-cpu: Fix build with sve

Signed-off-by: Molly Sophia <redacted>
* ggml-cpu: Remove unused variable in sve q3_k vec dot

Signed-off-by: Molly Sophia <redacted>
---------

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

cmdr2 [Mon, 3 Mar 2025 15:21:31 +0000 (20:51 +0530)]

cuda: unary ops as float + de-duplicate (#1130)

commit | commitdiff | tree

cmdr2 [Fri, 28 Feb 2025 10:29:55 +0000 (15:59 +0530)]

cuda/vulkan: specify fp32-only support for some operations in supports_op (#1129)

* cuda: restrict SILU_BACK to fp32, since fp16 exceeds the desired test threshold

* vulkan: specify fp32-only support for certain ops (that are now tested for fp16 as well)

* f32 sigmoid in vulkan supports op

* Revert "f32 sigmoid in vulkan supports op"

This reverts commit c6f04b3c19bf4504c2776149c6d8cd84e0b48acb.

commit | commitdiff | tree

cmdr2 [Fri, 28 Feb 2025 07:04:39 +0000 (12:34 +0530)]

cuda/cpu: Increase support for fp16 unary operations (#1125)

* Support fp16 unary operations in the CUDA backend

* cpu: increase fp16 support for unary operators in the CPU backend

* cuda: increase fp16 support for unary operators in the CUDA backend

* Add test cases for fp16 unary operators

* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing

* metal: fix PR comments for unary op support after fp16 unary tests

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Feb 2025 12:43:20 +0000 (14:43 +0200)]

sync : whisper.cpp

ggml-ci

commit | commitdiff | tree

Diego Devesa [Thu, 27 Feb 2025 12:35:07 +0000 (13:35 +0100)]

whisper : support GGML_BACKEND_DL (whisper/2843)

* whisper : support GGML_BACKEND_DL

* fix DTW crash

* whisper.objc : fix build - add ggml-cpp.h

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Feb 2025 11:11:33 +0000 (13:11 +0200)]

ci : fix workflow name

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Feb 2025 10:53:37 +0000 (12:53 +0200)]

examples : remove dr_wab.h (#1127)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Feb 2025 10:52:45 +0000 (12:52 +0200)]

sync : whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Feb 2025 10:50:32 +0000 (12:50 +0200)]

common : separate whisper sources (whisper/2846)

* common : separate whisper sources

* examples : add chrono

* examples : add more headers

commit | commitdiff | tree

Georgi Gerganov [Thu, 27 Feb 2025 08:39:13 +0000 (10:39 +0200)]

common : fix build min/max (whisper/2845)

* common : try to fix build

* cont : try another fix

commit | commitdiff | tree

Dmitry Atamanov [Thu, 27 Feb 2025 07:06:54 +0000 (12:06 +0500)]

examples : use miniaudio for direct decoding flac, mp3, ogg and wav (whisper/2759)

commit | commitdiff | tree

midnight [Wed, 5 Feb 2025 12:41:10 +0000 (04:41 -0800)]

cmake : fix compile assumptions for power9/etc (whisper/2777)

* Add small comment re: VSX to readme

Co-authored-by: midnight <redacted>

commit | commitdiff | tree

petterreinholdtsen [Wed, 26 Feb 2025 20:44:00 +0000 (21:44 +0100)]

Told cmake to install ggml-cpp.h as a public header file. (#1126)

It is used by Whisper talk-llama example.

Co-authored-by: Petter Reinholdtsen <redacted>

commit | commitdiff | tree

cmdr2 [Tue, 25 Feb 2025 12:36:34 +0000 (18:06 +0530)]

Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (#1121)

* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend

* Add fp16 support for add/sub/mul/div on the CPU backend

* Add test cases for fp16 add/sub/mul/div

commit | commitdiff | tree

Georgi Gerganov [Tue, 25 Feb 2025 09:44:48 +0000 (11:44 +0200)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

Gian-Carlo Pascutto [Tue, 25 Feb 2025 09:27:58 +0000 (10:27 +0100)]

metal : copy kernels for quant to F32/F16 conversions (llama/12017)

metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

lhez [Mon, 24 Feb 2025 21:47:07 +0000 (13:47 -0800)]

opencl: fix for small models (llama/11950)

* opencl: fix small shape gemv, remove unused extensions

* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size

* opencl: fix for token length < 4

* opencl: use wave size of 64 for all Adreno GPUs

---------

Co-authored-by: Shawn Gu <redacted>
Co-authored-by: Skyler Szot <redacted>

commit | commitdiff | tree

Neo Zhang Jianyu [Mon, 24 Feb 2025 14:33:23 +0000 (22:33 +0800)]

Optimize mul_mat for Q4_0 on Intel GPU (llama/12035)

* opt performance by reorder for Intel GPU

* detect hw type and save opt feature, and print opt feature

* correct name

* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed

* add env variable GGML_SYCL_DISABLE_OPT for debug

* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT

* add performance data

* mv getrows functions to separeted files

* fix global variables

---------

Co-authored-by: arthw <redacted>

commit | commitdiff | tree

Akarshan Biswas [Mon, 24 Feb 2025 10:18:25 +0000 (15:48 +0530)]

SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)

commit | commitdiff | tree

Aaron Teo [Sat, 22 Feb 2025 21:39:24 +0000 (05:39 +0800)]

ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)

* ggml: add s390x ARCH_FLAGS for compilation

Signed-off-by: Aaron Teo <redacted>
* ggml: add SIMD for s390x using vector intrinsics

SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16

SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)

Signed-off-by: Aaron Teo <redacted>
* ggml: fix missing escape character in GGML_F32x4_REDUCE

Signed-off-by: Aaron Teo <redacted>
* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR

Signed-off-by: Aaron Teo <redacted>
* ggml: fix s390x GGML_F32x4_REDUCE

Signed-off-by: Aaron Teo <redacted>
* ggml: full SIMD activation for F32,F16 s390x

Signed-off-by: Aaron Teo <redacted>
* ggml: add option to disable s390x VXE/VXE2

Signed-off-by: Aaron Teo <redacted>
* ggml: change vecintrin.h include to ggml-cpu-impl

* add __VXE__ and __VXE2__ macros

Signed-off-by: Aaron Teo <redacted>
* cmake: add s390x target detection for VX/VXE/VXE2

Signed-off-by: Aaron Teo <redacted>
* ggml: move s390x vector intrinsics to ggml-cpu-impl.h

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x Q8_0 SIMD

Signed-off-by: Aaron Teo <redacted>
* ggml: correct documentation for Q8_0

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x reduce code complexity Q8_0

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x bugfix typo Q8_0

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activated for Q4_1

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x inline vec_reve

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q4_0

Signed-off-by: Aaron Teo <redacted>
* ggml: add VXE backend feature

Signed-off-by: Aaron Teo <redacted>
* ggml: remove test.py

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for quantize_row_q8_0

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for quantize_row_q8_1

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for iq4_xs

Signed-off-by: Aaron Teo <redacted>
* ggml: bugfix iq4_xs

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for iq4_nl

Signed-off-by: Aaron Teo <redacted>
* ggml: add float, double, and long vector data type

Signed-off-by: Aaron Teo <redacted>
* ggml: clean up iq4_xs SIMD

Signed-off-by: Aaron Teo <redacted>
* ggml: fix improper use of restrict keyword

Signed-off-by: Aaron Teo <redacted>
* ggml: update warning message for ggml_vec_tbl

Signed-off-by: Aaron Teo <redacted>
* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K

Signed-off-by: Aaron Teo <redacted>
* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs

Signed-off-by: Aaron Teo <redacted>
* ggml: switch to restrict for iq4_nl

Signed-off-by: Aaron Teo <redacted>
* ggml: slight dot product speed improvement for q4_1_q8_1

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for q6_K

Signed-off-by: Aaron Teo <redacted>
* ggml: add missing `_t` to ggml_int8x16x4_t

Signed-off-by: Aaron Teo <redacted>
* ggml: fix missing `_t` for ggml_vec_xl_s8x4

Signed-off-by: Aaron Teo <redacted>
* ggml: fix more missing `_t`

Signed-off-by: Aaron Teo <redacted>
* ggml: add unroll and prefetch to Q8_0

increase of 3.86% for prompt processing and 32.22% for token generation

Signed-off-by: Aaron Teo <redacted>
* ggml: patch Q8_0 to use proper vector sizes

Signed-off-by: Aaron Teo <redacted>
* ggml: optimise Q8_0 dot prod compute kernel further

Signed-off-by: Aaron Teo <redacted>
* ggml: add unroll and prefetch to Q4_1

Signed-off-by: Aaron Teo <redacted>
* ggml: refactor Q6_K variable naming for readability

Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q6_K typos

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q5_K

Signed-off-by: Aaron Teo <redacted>
* ggml: fix wrong char*x16_t naming

Signed-off-by: Aaron Teo <redacted>
* ggml: Q5_K y0 wrong signness

Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q5_K invalid uchar type

Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q5_K invalid uchar type

Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q4_K

Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q4_K invalid vector intrinsics

Signed-off-by: Aaron Teo <redacted>
* ggml: simplify ggml_padd_s16 compute kernel

Signed-off-by: Aaron Teo <redacted>
* ggml: correct ggml-cpu vxe wording

Signed-off-by: Aaron Teo <redacted>
* ggml: change ggml_aligned_malloc alignment to 256

256 is the cache line size for s390x platforms

Signed-off-by: Aaron Teo <redacted>
* ggml: resolve pr merge via cherry-pick 225bbbf

Signed-off-by: Aaron Teo <redacted>
* ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)

* ggml: resolve pr merge via cherry-pick 4571953

Signed-off-by: Aaron Teo <redacted>
* ggml: cmake remove fork when determining s390x machine type

thank you @ericcurtin

Signed-off-by: Aaron Teo <redacted>
---------

Signed-off-by: Aaron Teo <redacted>
Co-authored-by: Jinyang He <redacted>
Co-authored-by: junchao-zhao <redacted>

commit | commitdiff | tree

Johannes Gäßler [Sat, 22 Feb 2025 19:44:34 +0000 (20:44 +0100)]

CUDA: app option to compile without FlashAttention (llama/12025)

commit | commitdiff | tree

Johannes Gäßler [Sat, 22 Feb 2025 11:20:17 +0000 (12:20 +0100)]

CUDA: optimize FA for GQA + large batches (llama/12014)

commit | commitdiff | tree

Gian-Carlo Pascutto [Sat, 22 Feb 2025 08:43:24 +0000 (09:43 +0100)]

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)

commit | commitdiff | tree

PureJourney [Fri, 21 Feb 2025 11:21:05 +0000 (19:21 +0800)]

CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)

* CUDA: correct the lowest Maxwell supported by CUDA 12

---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Bodhi [Fri, 21 Feb 2025 07:46:23 +0000 (15:46 +0800)]

MUSA: support ARM64 and enable dp4a .etc (llama/11843)

* MUSA: support ARM64 and enable __dp4a .etc

* fix cross entropy loss op for musa

* update

* add cc info log for musa

* add comment for the MUSA .cc calculation block

---------

Co-authored-by: Bodhi Hu <redacted>

commit | commitdiff | tree

Charles Xu [Thu, 20 Feb 2025 13:06:51 +0000 (14:06 +0100)]

ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)

* ggml-cpu: Add CPU backend support for KleidiAI library

* Add environmental variable GGML_KLEIDIAI_SME

* Add support for multithread LHS conversion

* Switch kernel selection order to dotprod and i8mm

* updates for review comments

* More updates for review comments

* Reorganize and rename KleidiAI files

* Move ggml-cpu-traits.h to source file

* Update cmake for SME build and add alignment for SME

* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list

commit | commitdiff | tree

Prashant Vithule [Thu, 20 Feb 2025 10:08:32 +0000 (15:38 +0530)]

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)

* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file

* Improved Formating of code in ggml-cpu-quants.c file

* style : minor fixes

* style : less whitespaces

* style : ptr spaceing

---------

Co-authored-by: vithulep <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Johannes Gäßler [Mon, 17 Feb 2025 13:03:24 +0000 (14:03 +0100)]

CUDA: use async data loading for FlashAttention (llama/11894)

* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

Rémy O [Mon, 17 Feb 2025 06:55:57 +0000 (07:55 +0100)]

vulkan: implement several ops relevant for ggml_opt (llama/11769)

* vulkan: support memset_tensor

* vulkan: support GGML_OP_SUM

* vulkan: implement GGML_OP_ARGMAX

* vulkan: implement GGML_OP_SUB

* vulkan: implement GGML_OP_COUNT_EQUAL

* vulkan: implement GGML_OP_OPT_STEP_ADAMW

* vulkan: fix check_results RWKV_WKV6 crash and memory leaks

* vulkan: implement GGML_OP_REPEAT_BACK

* tests: remove invalid test-backend-ops REPEAT_BACK tests

* vulkan: fix COUNT_EQUAL memset using a fillBuffer command

commit | commitdiff | tree

Jeff Bolz [Sun, 16 Feb 2025 07:52:23 +0000 (01:52 -0600)]

vulkan: support multi/vision rope, and noncontiguous rope (llama/11902)

commit | commitdiff | tree

Hale Chan [Sun, 16 Feb 2025 06:50:26 +0000 (14:50 +0800)]

metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)

commit | commitdiff | tree

Adrian Kretz [Sat, 15 Feb 2025 18:39:20 +0000 (19:39 +0100)]

metal : optimize dequant q6_K kernel (llama/11892)

commit | commitdiff | tree

Georgi Gerganov [Sat, 15 Feb 2025 14:40:57 +0000 (16:40 +0200)]

repo : update links to new url (llama/11886)

* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci

commit | commitdiff | tree

Rémy O [Sat, 15 Feb 2025 08:01:40 +0000 (09:01 +0100)]

vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)

* vulkan: initial support for IQ1_S and IQ1_M quantizations

* vulkan: define MMV kernels for IQ1 quantizations

* devops: increase timeout of Vulkan tests again

* vulkan: simplify ifdef for init_iq_shmem

commit | commitdiff | tree

lhez [Fri, 14 Feb 2025 19:12:23 +0000 (11:12 -0800)]

opencl: Fix rope and softmax (llama/11833)

* opencl: fix `ROPE`

* opencl: fix `SOFT_MAX`

* Add fp16 variant

* opencl: enforce subgroup size for `soft_max`

commit | commitdiff | tree

Diego Devesa [Fri, 14 Feb 2025 14:33:52 +0000 (15:33 +0100)]

cuda : add ampere to the list of default architectures (llama/11870)

commit | commitdiff | tree

Jinyang He [Fri, 14 Feb 2025 08:54:27 +0000 (16:54 +0800)]

ggml: optimize some vec dot functions for LoongArch ASX (llama/11842)

* Optimize ggml_vec_dot_q3_K_q8_K for LoongArch ASX

* Optimize ggml_vec_dot_q4_K_q8_K for LoongArch ASX

* Optimize ggml_vec_dot_q6_K_q8_K for LoongArch ASX

* Optimize ggml_vec_dot_q5_K_q8_K for LoongArch ASX

* Optimize ggml_vec_dot_q2_K_q8_K for LoongArch ASX

* Optimize mul_sum_i8_pairs_float for LoongArch ASX

* Optimize ggml_vec_dot_iq4_xs_q8_K for LoongArch ASX

commit | commitdiff | tree

Eve [Fri, 14 Feb 2025 02:59:40 +0000 (02:59 +0000)]

vulkan: linux builds + small subgroup size fixes (llama/11767)

* mm subgroup size

* upload vulkan x86 builds

commit | commitdiff | tree

Jeffrey Morgan [Thu, 13 Feb 2025 17:05:04 +0000 (09:05 -0800)]

llamafile: use member variable instead of constant for iq4nlt (llama/11780)

commit | commitdiff | tree

R0CKSTAR [Thu, 13 Feb 2025 12:28:18 +0000 (20:28 +0800)]

musa: bump MUSA SDK version to rc3.1.1 (llama/11822)

* musa: Update MUSA SDK version to rc3.1.1

Signed-off-by: Xiaodong Ye <redacted>
* musa: Remove workaround in PR #10042

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Diego Devesa [Thu, 13 Feb 2025 00:02:38 +0000 (01:02 +0100)]

ggml-cpu : add chunking support to mul_mat_id (llama/11666)

* ggml-cpu : add chunking support to mul_mat_id

* allocate chunk counter in wdata
parallelize src1 quantization by column to allows parallelization even when there is only one row

* disable for arm

* cleanup

* better way to disable for arm

* fix uninitialized counter when using 1 thread only

* revert test-backend-ops changes

commit | commitdiff | tree

Xuan-Son Nguyen [Wed, 12 Feb 2025 23:33:45 +0000 (00:33 +0100)]

ggml : x2 speed for WASM by optimizing SIMD (llama/11453)

* ggml : x2 speed for WASM by optimizing SIMD

* fix bad merging

* rm trailing spaces

* rm redundant clamp

* better quantize_row_q8_K

Co-authored-by: camel-cdr <redacted>
* remove memset that causes buffer overflow
Co-authored-by: camel-cdr <redacted>
---------

Co-authored-by: camel-cdr <redacted>

commit | commitdiff | tree

uvos [Wed, 12 Feb 2025 21:25:28 +0000 (22:25 +0100)]

HIP: Remove GCN from list of devices that avoid MMQ (llama/11831)

commit | commitdiff | tree

Georgi Gerganov [Wed, 12 Feb 2025 19:46:43 +0000 (21:46 +0200)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

uvos [Wed, 12 Feb 2025 16:25:03 +0000 (17:25 +0100)]

HIP: Switch to std::vector in rocblas version check (llama/11820)

commit | commitdiff | tree

bandoti [Wed, 12 Feb 2025 14:06:53 +0000 (10:06 -0400)]

cleanup: fix compile warnings associated with gnu_printf (llama/11811)

commit | commitdiff | tree

Richard [Wed, 12 Feb 2025 13:57:33 +0000 (13:57 +0000)]

ggml : fix multi-threaded clamp_f32 (llama/11824)

* Bug fix for clamp_f32

When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0.

* Bug fix for clamp_f32

* Bug fix for clamp_f32

commit | commitdiff | tree

Weizhao Ouyang [Wed, 12 Feb 2025 12:22:58 +0000 (20:22 +0800)]

ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817)

Signed-off-by: Weizhao Ouyang <redacted>

commit | commitdiff | tree

Johannes Gäßler [Wed, 12 Feb 2025 12:16:39 +0000 (13:16 +0100)]

CUDA: fix CUDART_VERSION checks (llama/11821)

commit | commitdiff | tree

Sheldon Robinson [Tue, 11 Feb 2025 15:55:45 +0000 (10:55 -0500)]

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (llama/11803)

* Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx

* Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string

commit | commitdiff | tree

Johannes Gäßler [Mon, 10 Feb 2025 23:17:22 +0000 (00:17 +0100)]

CUDA: use arch list for compatibility check (llama/11775)

* CUDA: use arch list for feature availability check

---------

Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

Maxim Evtush [Mon, 10 Feb 2025 22:21:31 +0000 (23:21 +0100)]

fix: typos in documentation files (llama/11791)

* Update ggml.c

* Update arg.cpp

* Update speculative.h

Packaging of ggml-org/ggml