]>
git.djapps.eu Git - pkg/ggml/sources/ggml/log
Henry Linjamäki [Mon, 10 Mar 2025 16:57:00 +0000 (18:57 +0200)]
opencl: use OpenCL C standard supported by the device (llama/12221)
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
Georgi Gerganov [Mon, 10 Mar 2025 12:07:15 +0000 (14:07 +0200)]
tests : fix test-quantize-fns to init the CPU backend (llama/12306)
ggml-ci
Jason C.H [Sat, 8 Mar 2025 16:02:39 +0000 (00:02 +0800)]
ggml-backend : make path_str compatible with C++20 (llama/12269)
Daniel Bevenius [Fri, 7 Mar 2025 13:15:27 +0000 (14:15 +0100)]
ggml : skip intermediate .air file when compiling .metallib (llama/12247)
This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.
The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.
Akarshan Biswas [Wed, 26 Mar 2025 07:51:18 +0000 (13:21 +0530)]
ci: disable test-opt for now (#1158)
* ci: disable test-opt for now
* Use CTEXT_EXTRA to disable tests
Akarshan Biswas [Tue, 25 Mar 2025 09:38:14 +0000 (15:08 +0530)]
ci: Initial SYCL setup (#1157)
cmdr2 [Thu, 13 Mar 2025 18:29:48 +0000 (23:59 +0530)]
Create CONTRIBUTING.md (#1146)
* Create CONTRIBUTING.md
* Update CONTRIBUTING.md
bssrdf [Thu, 13 Mar 2025 18:29:19 +0000 (14:29 -0400)]
gpt-2 : add comment about KV cache type (#1142)
* change KV cache to fp16 to take advantage of tensor cores
* added a note/comment to indicate kv can be FP16
Christian Kastner [Mon, 10 Mar 2025 18:19:58 +0000 (19:19 +0100)]
cmake: Enable specifying exact PowerPC CPU architecture (#1138)
In the process, guard automatic CPU detection with GGML_NATIVE.
https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#index-mcpu-10
Christian Kastner [Mon, 10 Mar 2025 12:06:21 +0000 (13:06 +0100)]
cmake: Comment out GGML_BIN_DIR for now (#1139)
Nothing installs to it yet, so when attempting to use the cmake package,
set_and_check() triggers an error if the directory doesn't already exist
for other reasons.
Georgi Gerganov [Sat, 8 Mar 2025 13:18:24 +0000 (15:18 +0200)]
spm : remove (#1135)
ggml-ci
Georgi Gerganov [Sat, 8 Mar 2025 13:14:03 +0000 (15:14 +0200)]
sync : whisper.cpp
ggml-ci
Dmitry Atamanov [Tue, 4 Mar 2025 17:05:21 +0000 (22:05 +0500)]
common : fix audio loading by miniaudio (whisper/2862)
Georgi Gerganov [Fri, 7 Mar 2025 12:50:30 +0000 (14:50 +0200)]
sync : llama.cpp
ggml-ci
Rémy O [Fri, 7 Mar 2025 11:54:22 +0000 (12:54 +0100)]
ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)
BB-fat [Fri, 7 Mar 2025 07:35:57 +0000 (15:35 +0800)]
metal : simplify kernel arguments using a struct (#3229) (llama/12194)
* metal : refactor im2col parameters into a struct
* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets
* metal : refactor sum_rows parameters into a struct
* metal : refactor soft_max parameters into a struct
* metal : refactor diag_mask_inf parameters into a struct
* metal : refactor ssm_conv parameters into a struct
* metal : refactor ssm_scan parameters into a struct
* metal : refactor get_rows parameters into a struct
* metal : refactor group_norm parameters into a struct
* metal : refactor conv_transpose_1d parameters into a struct
* metal : refactor upscale parameters into a struct
* metal : refactor pad parameters into a struct
* metal : refactor pad_reflect_1d parameters into a struct
* metal : refactor arange parameters into a struct
* metal : refactor timestep_embedding parameters into a struct
* metal : refactor argsort parameters into a struct
* metal : refactor leaky_relu parameters into a struct
* metal : refactor pool_2d parameters into a struct
* metal : fix trailing whitespace
---------
Co-authored-by: alexju <redacted>
Daniel Bevenius [Fri, 7 Mar 2025 05:23:16 +0000 (06:23 +0100)]
metal : fix default.metallib build (llama/12224)
This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.
The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'. Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```
With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
^~~~~~~~~~~~~~~~~~
"ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.
lhez [Fri, 7 Mar 2025 00:20:35 +0000 (16:20 -0800)]
opencl: Noncontiguous `norm`, `rms_norm`, disable `fp16` for some ops (llama/12217)
* opencl: support noncontiguous `norm`
* opencl: support noncontiguous `rms_norm`
* opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`
xiaofei [Thu, 6 Mar 2025 22:58:25 +0000 (06:58 +0800)]
cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (llama/12094)
Signed-off-by: Ray Lee <redacted>
Co-authored-by: Ray Lee <redacted>
Johannes Gäßler [Thu, 6 Mar 2025 17:45:09 +0000 (18:45 +0100)]
CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)
uvos [Thu, 6 Mar 2025 07:20:52 +0000 (08:20 +0100)]
HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (llama/12209)
This avoids conflict with internal cuda/hip runtimes memory managment behavior.
Henry Linjamäki [Thu, 6 Mar 2025 01:33:40 +0000 (03:33 +0200)]
opencl : fix buffer alignment (llama/12197)
Fix the following error:
```
ggml-alloc.c:99: not enough space in the buffer
ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed
27525120 , available
27521024 )
```
which occurs when `ggml_backend_opencl_context::alignment` is larger
than `cl_ptr_base` (hard-coded to `0x1000`).
Also, fix `ggml_backend_opencl_context::alignment` was set to
`CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the
value is reported in bits.
Henry Linjamäki [Thu, 6 Mar 2025 01:31:14 +0000 (03:31 +0200)]
opencl : fix `ulong` kernel args were set from `int` variables (llama/12174)
... which left garbage bits in the upper half of the kernel args. This
caused segmentation faults when running PoCL.
simon886212 [Thu, 6 Mar 2025 01:30:05 +0000 (09:30 +0800)]
opencl : fix profile-related errors (llama/12095)
Co-authored-by: ubuntu <redacted>
Rémy O [Thu, 6 Mar 2025 01:26:10 +0000 (02:26 +0100)]
ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (llama/12154)
* ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions
* cmake: Add GGML_BMI2 build option
* ggml: enable BMI2 on relevant CPU variants
* ggml-cpu: include BMI2 in backend score
* ggml-cpu: register BMI2 in ggml_backend_cpu_get_features
* ggml-cpu: add __BMI2__ define when using MSVC
Akarshan Biswas [Wed, 5 Mar 2025 15:58:23 +0000 (21:28 +0530)]
SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12201)
Plamen Minev [Wed, 5 Mar 2025 15:16:01 +0000 (17:16 +0200)]
ggml : fix GGMLMetalClass ODR (llama/12200)
-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.
vmobilis [Fri, 7 Mar 2025 08:11:40 +0000 (11:11 +0300)]
ggml : ggml_compute_forward_concat() for arbitrary tensor type (#1118)
* ggml_compute_forward_concat() for arbitrary tensor type
* Check that tensors' type match
* ggml-cpu.c: check type of source tensors
* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()
* ggml.c: check concatenated tensor type
* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c
..., as it was moved to ggml.c.
Christian Kastner [Thu, 6 Mar 2025 19:01:02 +0000 (20:01 +0100)]
pkg-config: Use CMake install paths for lib, include (#1133)
Georgi Gerganov [Tue, 4 Mar 2025 19:08:15 +0000 (21:08 +0200)]
vulkan : sync (llama/0)
ggml-ci
Georgi Gerganov [Tue, 4 Mar 2025 19:07:04 +0000 (21:07 +0200)]
sync : llama.cpp
mgroeber9110 [Tue, 4 Mar 2025 16:53:26 +0000 (17:53 +0100)]
ggml : portability fixes for VS 2017 (llama/12150)
* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <redacted>
David Huang [Mon, 3 Mar 2025 21:10:54 +0000 (05:10 +0800)]
HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/12032)
Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check
Adds rocWMMA support to fattn-wmma-f16
Diego Devesa [Mon, 3 Mar 2025 13:00:46 +0000 (14:00 +0100)]
test-backend-ops : add option -p to filter by op params (llama/12155)
ag2s20150909 [Mon, 3 Mar 2025 12:54:08 +0000 (20:54 +0800)]
ggml : fix kleidiai build (llama/12159)
The libggml API has changed, but this has not been updated.
Akarshan Biswas [Mon, 3 Mar 2025 10:07:22 +0000 (15:37 +0530)]
SYCL: Move CPY kernels to a separate file and add few missing kernels (llama/12133)
* SYCL: refactor and move cpy kernels to a separate file
* Add few missing cpy kernels
* refactor and add debug logs
Diego Devesa [Sun, 2 Mar 2025 21:11:00 +0000 (22:11 +0100)]
ggml-backend : keep paths in native string type when possible (llama/12144)
Erik Scholz [Sat, 1 Mar 2025 11:57:22 +0000 (12:57 +0100)]
CUDA: compress mode option and default to size (llama/12029)
cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".
William Tambellini [Fri, 28 Feb 2025 13:41:47 +0000 (05:41 -0800)]
ggml : upgrade init_tensor API to return a ggml_status (llama/11854)
* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <redacted>
Rémy O [Fri, 28 Feb 2025 08:42:52 +0000 (09:42 +0100)]
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (llama/11595)
* vulkan: implement specialized MMV kernels for IQ2 quantizations
* vulkan: add MMV kernels for IQ3 quants
* vulkan: Increase MMV batch size and unroll IQ LUT setup
* vulkan: fix init_iq_shmem for WG sizes larger than tables
* vulkan: common batch size for all I-quants
Johannes Gäßler [Fri, 28 Feb 2025 08:26:43 +0000 (09:26 +0100)]
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)
Prashant Vithule [Fri, 28 Feb 2025 07:36:12 +0000 (13:06 +0530)]
ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (llama/12064)
* Added SVE Support for Q2_K Quantized Models
* Use 4-space indentation in the switch cases
* removed comments lines
* Remove the loop Retain the curly bracess for better understanding of code
* Remove the comment like added for q3_k_q8_k kernel
---------
Co-authored-by: vithulep <redacted>
hipudding [Fri, 28 Feb 2025 07:23:47 +0000 (15:23 +0800)]
CANN: Fix build error with GCC 13 (llama/11990)
Remove unused header file that causes compilation failure on ARM
platform with GCC 13.
Eve [Fri, 28 Feb 2025 07:20:08 +0000 (07:20 +0000)]
vulkan: matmul dequantization improvements (llama/12015)
* faster dequant for old quants
* dont use unpack for iq4_nl
* vec2 unpack for q8
Daniele [Fri, 28 Feb 2025 06:52:51 +0000 (06:52 +0000)]
vulkan: improve im2col (llama/11826)
* vulkan: improve im2col performance
Vladimir Vuksanovic [Thu, 27 Feb 2025 07:42:48 +0000 (08:42 +0100)]
cmake: Fix ggml backend dependencies and installation (llama/11818)
* Fix dependencies between ggml and backends
ggml backends link only to ggml-base and ggml links to all backends.
* Fix installation of ggml backends
Set up GNUInstallDirs before setting the installation directory of ggml backends
Jeff Bolz [Tue, 25 Feb 2025 15:30:21 +0000 (09:30 -0600)]
vulkan: fix assertion when qy_needs_dequant (llama/12068)
Looks like a copy/paste bug from qx_needs_dequant.
Molly Sophia [Tue, 25 Feb 2025 11:28:22 +0000 (19:28 +0800)]
ggml-cpu: Fix build with sve (llama/12059)
* ggml-cpu: Fix build with sve
Signed-off-by: Molly Sophia <redacted>
* ggml-cpu: Remove unused variable in sve q3_k vec dot
Signed-off-by: Molly Sophia <redacted>
---------
Signed-off-by: Molly Sophia <redacted>
cmdr2 [Mon, 3 Mar 2025 15:21:31 +0000 (20:51 +0530)]
cuda: unary ops as float + de-duplicate (#1130)
cmdr2 [Fri, 28 Feb 2025 10:29:55 +0000 (15:59 +0530)]
cuda/vulkan: specify fp32-only support for some operations in supports_op (#1129)
* cuda: restrict SILU_BACK to fp32, since fp16 exceeds the desired test threshold
* vulkan: specify fp32-only support for certain ops (that are now tested for fp16 as well)
* f32 sigmoid in vulkan supports op
* Revert "f32 sigmoid in vulkan supports op"
This reverts commit
c6f04b3c19bf4504c2776149c6d8cd84e0b48acb .
cmdr2 [Fri, 28 Feb 2025 07:04:39 +0000 (12:34 +0530)]
cuda/cpu: Increase support for fp16 unary operations (#1125)
* Support fp16 unary operations in the CUDA backend
* cpu: increase fp16 support for unary operators in the CPU backend
* cuda: increase fp16 support for unary operators in the CUDA backend
* Add test cases for fp16 unary operators
* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing
* metal: fix PR comments for unary op support after fp16 unary tests
Georgi Gerganov [Thu, 27 Feb 2025 12:43:20 +0000 (14:43 +0200)]
sync : whisper.cpp
ggml-ci
Diego Devesa [Thu, 27 Feb 2025 12:35:07 +0000 (13:35 +0100)]
whisper : support GGML_BACKEND_DL (whisper/2843)
* whisper : support GGML_BACKEND_DL
* fix DTW crash
* whisper.objc : fix build - add ggml-cpp.h
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Thu, 27 Feb 2025 11:11:33 +0000 (13:11 +0200)]
ci : fix workflow name
Georgi Gerganov [Thu, 27 Feb 2025 10:53:37 +0000 (12:53 +0200)]
examples : remove dr_wab.h (#1127)
ggml-ci
Georgi Gerganov [Thu, 27 Feb 2025 10:52:45 +0000 (12:52 +0200)]
sync : whisper.cpp
Georgi Gerganov [Thu, 27 Feb 2025 10:50:32 +0000 (12:50 +0200)]
common : separate whisper sources (whisper/2846)
* common : separate whisper sources
* examples : add chrono
* examples : add more headers
Georgi Gerganov [Thu, 27 Feb 2025 08:39:13 +0000 (10:39 +0200)]
common : fix build min/max (whisper/2845)
* common : try to fix build
* cont : try another fix
Dmitry Atamanov [Thu, 27 Feb 2025 07:06:54 +0000 (12:06 +0500)]
examples : use miniaudio for direct decoding flac, mp3, ogg and wav (whisper/2759)
midnight [Wed, 5 Feb 2025 12:41:10 +0000 (04:41 -0800)]
cmake : fix compile assumptions for power9/etc (whisper/2777)
* Add small comment re: VSX to readme
Co-authored-by: midnight <redacted>
petterreinholdtsen [Wed, 26 Feb 2025 20:44:00 +0000 (21:44 +0100)]
Told cmake to install ggml-cpp.h as a public header file. (#1126)
It is used by Whisper talk-llama example.
Co-authored-by: Petter Reinholdtsen <redacted>
cmdr2 [Tue, 25 Feb 2025 12:36:34 +0000 (18:06 +0530)]
Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (#1121)
* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend
* Add fp16 support for add/sub/mul/div on the CPU backend
* Add test cases for fp16 add/sub/mul/div
Georgi Gerganov [Tue, 25 Feb 2025 09:44:48 +0000 (11:44 +0200)]
sync : llama.cpp
ggml-ci
Gian-Carlo Pascutto [Tue, 25 Feb 2025 09:27:58 +0000 (10:27 +0100)]
metal : copy kernels for quant to F32/F16 conversions (llama/12017)
metal: use dequantize_q templates
---------
Co-authored-by: Georgi Gerganov <redacted>
lhez [Mon, 24 Feb 2025 21:47:07 +0000 (13:47 -0800)]
opencl: fix for small models (llama/11950)
* opencl: fix small shape gemv, remove unused extensions
* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size
* opencl: fix for token length < 4
* opencl: use wave size of 64 for all Adreno GPUs
---------
Co-authored-by: Shawn Gu <redacted>
Co-authored-by: Skyler Szot <redacted>
Neo Zhang Jianyu [Mon, 24 Feb 2025 14:33:23 +0000 (22:33 +0800)]
Optimize mul_mat for Q4_0 on Intel GPU (llama/12035)
* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <redacted>
Akarshan Biswas [Mon, 24 Feb 2025 10:18:25 +0000 (15:48 +0530)]
SYCL: Fix GGML_SYCL_DEBUG macro (llama/11995)
Aaron Teo [Sat, 22 Feb 2025 21:39:24 +0000 (05:39 +0800)]
ggml-cpu: Support s390x SIMD Instruction Set (llama/12019)
* ggml: add s390x ARCH_FLAGS for compilation
Signed-off-by: Aaron Teo <redacted>
* ggml: add SIMD for s390x using vector intrinsics
SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16
SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)
Signed-off-by: Aaron Teo <redacted>
* ggml: fix missing escape character in GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <redacted>
* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR
Signed-off-by: Aaron Teo <redacted>
* ggml: fix s390x GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <redacted>
* ggml: full SIMD activation for F32,F16 s390x
Signed-off-by: Aaron Teo <redacted>
* ggml: add option to disable s390x VXE/VXE2
Signed-off-by: Aaron Teo <redacted>
* ggml: change vecintrin.h include to ggml-cpu-impl
* add __VXE__ and __VXE2__ macros
Signed-off-by: Aaron Teo <redacted>
* cmake: add s390x target detection for VX/VXE/VXE2
Signed-off-by: Aaron Teo <redacted>
* ggml: move s390x vector intrinsics to ggml-cpu-impl.h
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x Q8_0 SIMD
Signed-off-by: Aaron Teo <redacted>
* ggml: correct documentation for Q8_0
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x reduce code complexity Q8_0
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x bugfix typo Q8_0
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activated for Q4_1
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x inline vec_reve
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q4_0
Signed-off-by: Aaron Teo <redacted>
* ggml: add VXE backend feature
Signed-off-by: Aaron Teo <redacted>
* ggml: remove test.py
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for quantize_row_q8_0
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for quantize_row_q8_1
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for iq4_xs
Signed-off-by: Aaron Teo <redacted>
* ggml: bugfix iq4_xs
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for iq4_nl
Signed-off-by: Aaron Teo <redacted>
* ggml: add float, double, and long vector data type
Signed-off-by: Aaron Teo <redacted>
* ggml: clean up iq4_xs SIMD
Signed-off-by: Aaron Teo <redacted>
* ggml: fix improper use of restrict keyword
Signed-off-by: Aaron Teo <redacted>
* ggml: update warning message for ggml_vec_tbl
Signed-off-by: Aaron Teo <redacted>
* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K
Signed-off-by: Aaron Teo <redacted>
* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs
Signed-off-by: Aaron Teo <redacted>
* ggml: switch to restrict for iq4_nl
Signed-off-by: Aaron Teo <redacted>
* ggml: slight dot product speed improvement for q4_1_q8_1
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for q6_K
Signed-off-by: Aaron Teo <redacted>
* ggml: add missing `_t` to ggml_int8x16x4_t
Signed-off-by: Aaron Teo <redacted>
* ggml: fix missing `_t` for ggml_vec_xl_s8x4
Signed-off-by: Aaron Teo <redacted>
* ggml: fix more missing `_t`
Signed-off-by: Aaron Teo <redacted>
* ggml: add unroll and prefetch to Q8_0
increase of 3.86% for prompt processing and 32.22% for token generation
Signed-off-by: Aaron Teo <redacted>
* ggml: patch Q8_0 to use proper vector sizes
Signed-off-by: Aaron Teo <redacted>
* ggml: optimise Q8_0 dot prod compute kernel further
Signed-off-by: Aaron Teo <redacted>
* ggml: add unroll and prefetch to Q4_1
Signed-off-by: Aaron Teo <redacted>
* ggml: refactor Q6_K variable naming for readability
Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q6_K typos
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q5_K
Signed-off-by: Aaron Teo <redacted>
* ggml: fix wrong char*x16_t naming
Signed-off-by: Aaron Teo <redacted>
* ggml: Q5_K y0 wrong signness
Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <redacted>
* ggml: s390x SIMD activation for Q4_K
Signed-off-by: Aaron Teo <redacted>
* ggml: fix Q4_K invalid vector intrinsics
Signed-off-by: Aaron Teo <redacted>
* ggml: simplify ggml_padd_s16 compute kernel
Signed-off-by: Aaron Teo <redacted>
* ggml: correct ggml-cpu vxe wording
Signed-off-by: Aaron Teo <redacted>
* ggml: change ggml_aligned_malloc alignment to 256
256 is the cache line size for s390x platforms
Signed-off-by: Aaron Teo <redacted>
* ggml: resolve pr merge via cherry-pick
225bbbf
Signed-off-by: Aaron Teo <redacted>
* ggml : fix LoongArch compile error with 128-bit SIMD (llama/11701)
* ggml: resolve pr merge via cherry-pick
4571953
Signed-off-by: Aaron Teo <redacted>
* ggml: cmake remove fork when determining s390x machine type
thank you @ericcurtin
Signed-off-by: Aaron Teo <redacted>
---------
Signed-off-by: Aaron Teo <redacted>
Co-authored-by: Jinyang He <redacted>
Co-authored-by: junchao-zhao <redacted>
Johannes Gäßler [Sat, 22 Feb 2025 19:44:34 +0000 (20:44 +0100)]
CUDA: app option to compile without FlashAttention (llama/12025)
Johannes Gäßler [Sat, 22 Feb 2025 11:20:17 +0000 (12:20 +0100)]
CUDA: optimize FA for GQA + large batches (llama/12014)
Gian-Carlo Pascutto [Sat, 22 Feb 2025 08:43:24 +0000 (09:43 +0100)]
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (llama/12000)
PureJourney [Fri, 21 Feb 2025 11:21:05 +0000 (19:21 +0800)]
CUDA: correct the lowest Maxwell supported by CUDA 12 (llama/11984)
* CUDA: correct the lowest Maxwell supported by CUDA 12
---------
Co-authored-by: Johannes Gäßler <redacted>
Bodhi [Fri, 21 Feb 2025 07:46:23 +0000 (15:46 +0800)]
MUSA: support ARM64 and enable dp4a .etc (llama/11843)
* MUSA: support ARM64 and enable __dp4a .etc
* fix cross entropy loss op for musa
* update
* add cc info log for musa
* add comment for the MUSA .cc calculation block
---------
Co-authored-by: Bodhi Hu <redacted>
Charles Xu [Thu, 20 Feb 2025 13:06:51 +0000 (14:06 +0100)]
ggml-cpu: Add CPU backend support for KleidiAI library (llama/11390)
* ggml-cpu: Add CPU backend support for KleidiAI library
* Add environmental variable GGML_KLEIDIAI_SME
* Add support for multithread LHS conversion
* Switch kernel selection order to dotprod and i8mm
* updates for review comments
* More updates for review comments
* Reorganize and rename KleidiAI files
* Move ggml-cpu-traits.h to source file
* Update cmake for SME build and add alignment for SME
* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list
Prashant Vithule [Thu, 20 Feb 2025 10:08:32 +0000 (15:38 +0530)]
ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (llama/11917)
* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file
* Improved Formating of code in ggml-cpu-quants.c file
* style : minor fixes
* style : less whitespaces
* style : ptr spaceing
---------
Co-authored-by: vithulep <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Johannes Gäßler [Mon, 17 Feb 2025 13:03:24 +0000 (14:03 +0100)]
CUDA: use async data loading for FlashAttention (llama/11894)
* CUDA: use async data loading for FlashAttention
---------
Co-authored-by: Diego Devesa <redacted>
Rémy O [Mon, 17 Feb 2025 06:55:57 +0000 (07:55 +0100)]
vulkan: implement several ops relevant for ggml_opt (llama/11769)
* vulkan: support memset_tensor
* vulkan: support GGML_OP_SUM
* vulkan: implement GGML_OP_ARGMAX
* vulkan: implement GGML_OP_SUB
* vulkan: implement GGML_OP_COUNT_EQUAL
* vulkan: implement GGML_OP_OPT_STEP_ADAMW
* vulkan: fix check_results RWKV_WKV6 crash and memory leaks
* vulkan: implement GGML_OP_REPEAT_BACK
* tests: remove invalid test-backend-ops REPEAT_BACK tests
* vulkan: fix COUNT_EQUAL memset using a fillBuffer command
Jeff Bolz [Sun, 16 Feb 2025 07:52:23 +0000 (01:52 -0600)]
vulkan: support multi/vision rope, and noncontiguous rope (llama/11902)
Hale Chan [Sun, 16 Feb 2025 06:50:26 +0000 (14:50 +0800)]
metal : fix the crash caused by the lack of residency set support on Intel Macs. (llama/11904)
Adrian Kretz [Sat, 15 Feb 2025 18:39:20 +0000 (19:39 +0100)]
metal : optimize dequant q6_K kernel (llama/11892)
Georgi Gerganov [Sat, 15 Feb 2025 14:40:57 +0000 (16:40 +0200)]
repo : update links to new url (llama/11886)
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
Rémy O [Sat, 15 Feb 2025 08:01:40 +0000 (09:01 +0100)]
vulkan: initial support for IQ1_S and IQ1_M quantizations (llama/11528)
* vulkan: initial support for IQ1_S and IQ1_M quantizations
* vulkan: define MMV kernels for IQ1 quantizations
* devops: increase timeout of Vulkan tests again
* vulkan: simplify ifdef for init_iq_shmem
lhez [Fri, 14 Feb 2025 19:12:23 +0000 (11:12 -0800)]
opencl: Fix rope and softmax (llama/11833)
* opencl: fix `ROPE`
* opencl: fix `SOFT_MAX`
* Add fp16 variant
* opencl: enforce subgroup size for `soft_max`
Diego Devesa [Fri, 14 Feb 2025 14:33:52 +0000 (15:33 +0100)]
cuda : add ampere to the list of default architectures (llama/11870)
Jinyang He [Fri, 14 Feb 2025 08:54:27 +0000 (16:54 +0800)]
ggml: optimize some vec dot functions for LoongArch ASX (llama/11842)
* Optimize ggml_vec_dot_q3_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q4_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q6_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q5_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q2_K_q8_K for LoongArch ASX
* Optimize mul_sum_i8_pairs_float for LoongArch ASX
* Optimize ggml_vec_dot_iq4_xs_q8_K for LoongArch ASX
Eve [Fri, 14 Feb 2025 02:59:40 +0000 (02:59 +0000)]
vulkan: linux builds + small subgroup size fixes (llama/11767)
* mm subgroup size
* upload vulkan x86 builds
Jeffrey Morgan [Thu, 13 Feb 2025 17:05:04 +0000 (09:05 -0800)]
llamafile: use member variable instead of constant for iq4nlt (llama/11780)
R0CKSTAR [Thu, 13 Feb 2025 12:28:18 +0000 (20:28 +0800)]
musa: bump MUSA SDK version to rc3.1.1 (llama/11822)
* musa: Update MUSA SDK version to rc3.1.1
Signed-off-by: Xiaodong Ye <redacted>
* musa: Remove workaround in PR #10042
Signed-off-by: Xiaodong Ye <redacted>
---------
Signed-off-by: Xiaodong Ye <redacted>
Diego Devesa [Thu, 13 Feb 2025 00:02:38 +0000 (01:02 +0100)]
ggml-cpu : add chunking support to mul_mat_id (llama/11666)
* ggml-cpu : add chunking support to mul_mat_id
* allocate chunk counter in wdata
parallelize src1 quantization by column to allows parallelization even when there is only one row
* disable for arm
* cleanup
* better way to disable for arm
* fix uninitialized counter when using 1 thread only
* revert test-backend-ops changes
Xuan-Son Nguyen [Wed, 12 Feb 2025 23:33:45 +0000 (00:33 +0100)]
ggml : x2 speed for WASM by optimizing SIMD (llama/11453)
* ggml : x2 speed for WASM by optimizing SIMD
* fix bad merging
* rm trailing spaces
* rm redundant clamp
* better quantize_row_q8_K
Co-authored-by: camel-cdr <redacted>
* remove memset that causes buffer overflow
Co-authored-by: camel-cdr <redacted>
---------
Co-authored-by: camel-cdr <redacted>
uvos [Wed, 12 Feb 2025 21:25:28 +0000 (22:25 +0100)]
HIP: Remove GCN from list of devices that avoid MMQ (llama/11831)
Georgi Gerganov [Wed, 12 Feb 2025 19:46:43 +0000 (21:46 +0200)]
sync : llama.cpp
ggml-ci
uvos [Wed, 12 Feb 2025 16:25:03 +0000 (17:25 +0100)]
HIP: Switch to std::vector in rocblas version check (llama/11820)
bandoti [Wed, 12 Feb 2025 14:06:53 +0000 (10:06 -0400)]
cleanup: fix compile warnings associated with gnu_printf (llama/11811)
Richard [Wed, 12 Feb 2025 13:57:33 +0000 (13:57 +0000)]
ggml : fix multi-threaded clamp_f32 (llama/11824)
* Bug fix for clamp_f32
When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0.
* Bug fix for clamp_f32
* Bug fix for clamp_f32
Weizhao Ouyang [Wed, 12 Feb 2025 12:22:58 +0000 (20:22 +0800)]
ggml-cpu: Fix duplicate MATMUL_INT8 (llama/11817)
Signed-off-by: Weizhao Ouyang <redacted>
Johannes Gäßler [Wed, 12 Feb 2025 12:16:39 +0000 (13:16 +0100)]
CUDA: fix CUDART_VERSION checks (llama/11821)
Sheldon Robinson [Tue, 11 Feb 2025 15:55:45 +0000 (10:55 -0500)]
Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (llama/11803)
* Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx
* Fix #11802: PR #11803 - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string
Johannes Gäßler [Mon, 10 Feb 2025 23:17:22 +0000 (00:17 +0100)]
CUDA: use arch list for compatibility check (llama/11775)
* CUDA: use arch list for feature availability check
---------
Co-authored-by: Diego Devesa <redacted>
Maxim Evtush [Mon, 10 Feb 2025 22:21:31 +0000 (23:21 +0100)]
fix: typos in documentation files (llama/11791)
* Update ggml.c
* Update arg.cpp
* Update speculative.h