]> git.djapps.eu Git - pkg/ggml/sources/ggml/log
pkg/ggml/sources/ggml
10 months agoggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192)
slaren [Mon, 26 Aug 2024 17:44:43 +0000 (19:44 +0200)]
ggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192)

10 months agometal : separate scale and mask from QKT in FA kernel (llama/9189)
Georgi Gerganov [Mon, 26 Aug 2024 15:31:02 +0000 (18:31 +0300)]
metal : separate scale and mask from QKT in FA kernel (llama/9189)

* metal : separate scale and mask from QKT in FA kernel

* metal : ne01 check no longer necessary

* metal : keep data in local memory

10 months agoggml : add SSM Metal kernels (llama/8546)
Georgi Gerganov [Mon, 26 Aug 2024 14:55:36 +0000 (17:55 +0300)]
ggml : add SSM Metal kernels (llama/8546)

* ggml : add ggml_ssm_conv metal impl

* ggml : add ssm_scan metal impl

ggml-ci

10 months agometal : gemma2 flash attention support (llama/9159)
slaren [Mon, 26 Aug 2024 09:08:59 +0000 (11:08 +0200)]
metal : gemma2 flash attention support (llama/9159)

10 months agoCPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
Johannes Gäßler [Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)]
CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

10 months agoAdd a space to supress a cmake warning (llama/9133)
Akarshan Biswas [Thu, 22 Aug 2024 14:09:47 +0000 (19:39 +0530)]
Add a space to supress a cmake warning (llama/9133)

10 months agoAdd oneDNN primitive support (llama/9091)
luoyu-intel [Thu, 22 Aug 2024 04:50:10 +0000 (12:50 +0800)]
Add oneDNN primitive support (llama/9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

10 months agollama : simplify Mamba with advanced batch splits (llama/8526)
compilade [Wed, 21 Aug 2024 21:58:11 +0000 (17:58 -0400)]
llama : simplify Mamba with advanced batch splits (llama/8526)

* llama : advanced batch splits

This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.

* llama : always make recurrent state slots contiguous

* ggml : simplify mamba operators

* llama : fix integer signedness mixing

* llama : logits_all has priority over batch->logits

Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.

* llama : apply suggestions

Co-authored-by: Georgi Gerganov <redacted>
* llama : fix t5 segfault

* llama : fix Mamba session save and restore

* llama : minor cosmetic changes

* llama : rename llama_reorder_outputs to llama_output_reorder

Also move it closer to llama_output_reserve.

* llama : fix pooled embeddings when using batches with equal_seqs

* minor : add struct members for clarity

ggml-ci

* llama : fix T5 segfault again

* llama : fix Mamba pooled embeddings with multiple sequences

Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.

* llama : add llama_model_is_recurrent to simplify figuring that out

This will make it easier to more cleanly support RWKV-v6 and Mamba-2.

* llama : fix simple splits when the batch contains embeddings

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agofallback mmvq (llama/9088)
Meng, Hengyu [Tue, 20 Aug 2024 15:50:17 +0000 (23:50 +0800)]
fallback mmvq (llama/9088)

* fallback mmvq to mul_mat

* mmvq in cuda path

* Update ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <redacted>
---------

Co-authored-by: Alberto Cabrera Pérez <redacted>
10 months agoFix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052)
zhentaoyu [Tue, 20 Aug 2024 15:06:51 +0000 (23:06 +0800)]
Fix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052)

* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert overflow

Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <redacted>
* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <redacted>
* sycl:refine convert

Signed-off-by: zhentaoyu <redacted>
* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <redacted>
* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <redacted>
* test: make new cases only in sycl

Signed-off-by: zhentaoyu <redacted>
* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <redacted>
---------

Signed-off-by: zhentaoyu <redacted>
10 months agorpc : print error message when failed to connect endpoint (llama/9042)
Radoslav Gerganov [Mon, 19 Aug 2024 07:11:45 +0000 (10:11 +0300)]
rpc : print error message when failed to connect endpoint (llama/9042)

10 months agorpc : prevent crashes on invalid input (llama/9040)
Radoslav Gerganov [Mon, 19 Aug 2024 07:10:21 +0000 (10:10 +0300)]
rpc : prevent crashes on invalid input (llama/9040)

Add more checks which prevent RPC server from crashing if invalid input
is received from client

10 months agoggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)
Nico Bosshard [Fri, 16 Aug 2024 02:22:55 +0000 (04:22 +0200)]
ggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)

* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes

10 months agocmake : remove unused option GGML_CURL (llama/9011)
Georgi Gerganov [Wed, 14 Aug 2024 06:14:49 +0000 (09:14 +0300)]
cmake : remove unused option GGML_CURL (llama/9011)

10 months agoggml : move rope type enum to ggml.h (llama/8949)
Daniel Bevenius [Tue, 13 Aug 2024 19:13:15 +0000 (21:13 +0200)]
ggml : move rope type enum to ggml.h (llama/8949)

* ggml : move rope type enum to ggml.h

This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.

The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.

Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.

* squash! ggml : move rope type enum to ggml.h

This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.

I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.

* squash! ggml : move rope type enum to ggml.h

This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.

* squash! ggml : move rope type enum to ggml.h

This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.

* squash! ggml : move rope type enum to ggml.h

This commit fixes the editorconfig-checker warnings.

* squash! ggml : move rope type enum to ggml.h

Update comment for ggml_rope function.

* Revert "squash! ggml : move rope type enum to ggml.h"

This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6.

* squash! ggml : move rope type enum to ggml.h

Add GGML_ROPE_TYPE_NEOX to rope_common.comp.

* remove extra line

---------

Co-authored-by: slaren <redacted>
10 months agoggml: fix div-by-zero (llama/9003)
DavidKorczynski [Mon, 12 Aug 2024 12:21:41 +0000 (13:21 +0100)]
ggml: fix div-by-zero (llama/9003)

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724
In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5

Signed-off-by: David Korczynski <redacted>
10 months agoOptimize Vulkan backend for better CPU performance and less GPU synchronization overh...
Markus Tavenrath [Sun, 11 Aug 2024 08:09:09 +0000 (10:09 +0200)]
Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (llama/8943)

* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <redacted>
10 months agofeat: ref. cross entropy, add CUDA, fix grad test (#929)
Johannes Gäßler [Tue, 27 Aug 2024 18:39:30 +0000 (20:39 +0200)]
feat: ref. cross entropy, add CUDA, fix grad test (#929)

10 months agotests : fix memory leaks (#936)
Salvatore Mesoraca [Tue, 27 Aug 2024 06:25:12 +0000 (08:25 +0200)]
tests : fix memory leaks (#936)

It is annoying to run the tests using the sanitizers
because of all the uninteresting reports about the memory
leaked by the tests themselves.

Signed-off-by: Salvatore Mesoraca <redacted>
10 months agoggml: remove bad assert (#928)
Johannes Gäßler [Sat, 24 Aug 2024 17:27:02 +0000 (19:27 +0200)]
ggml: remove bad assert (#928)

10 months agoset NULL to ggml_context pointer to pass assert check in in case some compiler does...
ucag.li [Thu, 22 Aug 2024 19:49:45 +0000 (03:49 +0800)]
set NULL to ggml_context pointer to pass assert check in  in case some compiler does not set uninitialized pointer to NULL for mnist example

10 months agoexamples: add MNIST training + missing ops
Johannes Gäßler [Tue, 30 Jul 2024 13:56:35 +0000 (15:56 +0200)]
examples: add MNIST training + missing ops

10 months agoyolo : add backend support (#924)
Radoslav Gerganov [Mon, 19 Aug 2024 07:09:33 +0000 (10:09 +0300)]
yolo : add backend support (#924)

* yolo : add backend support

* metal : add sub and sqrt kernels

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agoggml : fix typo in ggml-quants.c comment (#922)
Daniel Bevenius [Thu, 15 Aug 2024 07:42:38 +0000 (09:42 +0200)]
ggml : fix typo in ggml-quants.c comment (#922)

10 months agofeat: add new `sin` and `cos` operators (#919)
Ronsor [Mon, 12 Aug 2024 13:02:08 +0000 (06:02 -0700)]
feat: add new `sin` and `cos` operators (#919)

* ggml : add sin/cos operators

* ggml-cuda : add sin/cos operators

* ggml : add corresponding tests for sin/cos

* ggml : add backward computation for sin/cos operators

* ggml-vulkan : add sin/cos operators

* ggml-vulkan : add sin/cos shader source

* metal : add sin, cos

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agoggml : support forward pass broadcasting in ggml_sub (#914)
Salvatore Mesoraca [Sun, 11 Aug 2024 08:08:53 +0000 (10:08 +0200)]
ggml : support forward pass broadcasting in ggml_sub (#914)

* ggml: support forward pass broadcasting in ggml_sub

Signed-off-by: Salvatore Mesoraca <redacted>
* Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32

The check is already performed in ggml_sub_impl

Signed-off-by: Salvatore Mesoraca <redacted>
---------

Signed-off-by: Salvatore Mesoraca <redacted>
10 months agosync : llama.cpp
Georgi Gerganov [Sun, 11 Aug 2024 08:06:21 +0000 (11:06 +0300)]
sync : llama.cpp

10 months agometal : fix uninitialized abort_callback (llama/8968)
slaren [Sat, 10 Aug 2024 13:42:10 +0000 (15:42 +0200)]
metal : fix uninitialized abort_callback (llama/8968)

10 months agosync : llama.cpp
Georgi Gerganov [Sat, 10 Aug 2024 06:51:19 +0000 (09:51 +0300)]
sync : llama.cpp

10 months agorpc : sanitize tensor data + warnings (llama/0)
Georgi Gerganov [Fri, 9 Aug 2024 20:03:21 +0000 (23:03 +0300)]
rpc : sanitize tensor data + warnings (llama/0)

Co-authored-by: slaren <redacted>
10 months agosync : whisper.cpp
Georgi Gerganov [Fri, 9 Aug 2024 07:03:29 +0000 (10:03 +0300)]
sync : whisper.cpp

10 months agowhisper : use vulkan as gpu backend when available (whisper/2302)
Matt Stephenson [Tue, 16 Jul 2024 07:21:09 +0000 (03:21 -0400)]
whisper : use vulkan as gpu backend when available (whisper/2302)

* ggml: use vulkan as gpu backend when available

Signed-off-by: Matt Stephenson <redacted>
* whisper: enable using vk as default buffer type

Signed-off-by: Matt Stephenson <redacted>
---------

Signed-off-by: Matt Stephenson <redacted>
10 months agoggml : add CANN backend (llama/0)
hipudding [Thu, 8 Aug 2024 11:48:06 +0000 (14:48 +0300)]
ggml : add CANN backend (llama/0)

ggml-ci

10 months agosync : vulkan (llama/0)
Georgi Gerganov [Thu, 8 Aug 2024 11:46:24 +0000 (14:46 +0300)]
sync : vulkan (llama/0)

10 months agoscripts : sync sycl (#0)
Georgi Gerganov [Thu, 8 Aug 2024 10:57:33 +0000 (13:57 +0300)]
scripts : sync sycl (#0)

10 months agoscripts : remove obsolete header (#0)
Georgi Gerganov [Thu, 8 Aug 2024 10:54:39 +0000 (13:54 +0300)]
scripts : remove obsolete header (#0)

10 months agoscripts : update sync scripts (#0)
Georgi Gerganov [Thu, 8 Aug 2024 10:51:09 +0000 (13:51 +0300)]
scripts : update sync scripts (#0)

10 months agosync : llama.cpp
Georgi Gerganov [Thu, 8 Aug 2024 10:25:41 +0000 (13:25 +0300)]
sync : llama.cpp

ggml-ci

10 months agoggml-backend : fix async copy from CPU (llama/8897)
slaren [Wed, 7 Aug 2024 11:29:02 +0000 (13:29 +0200)]
ggml-backend : fix async copy from CPU (llama/8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

10 months agoUpdated SYCL device filtering (llama/8901)
Ouadie EL FAROUKI [Wed, 7 Aug 2024 10:25:36 +0000 (11:25 +0100)]
Updated SYCL device filtering (llama/8901)

* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme

10 months agoCUDA/HIP: fix tests/test-backend-ops (llama/8896)
Johannes Gäßler [Wed, 7 Aug 2024 07:07:52 +0000 (09:07 +0200)]
CUDA/HIP: fix tests/test-backend-ops (llama/8896)

10 months agoCUDA: fix padding logic for FP16/FP32 (llama/8884)
Johannes Gäßler [Tue, 6 Aug 2024 15:13:55 +0000 (17:13 +0200)]
CUDA: fix padding logic for FP16/FP32 (llama/8884)

10 months agoggml : add epsilon as a parameter for group_norm (llama/8818)
Molly Sophia [Tue, 6 Aug 2024 07:26:46 +0000 (15:26 +0800)]
ggml : add epsilon as a parameter for group_norm (llama/8818)

Signed-off-by: Molly Sophia <redacted>
10 months agoFix ggml_backend_cann_buffer_get_tensor (llama/8871)
Mengqing Cao [Tue, 6 Aug 2024 04:42:42 +0000 (12:42 +0800)]
Fix ggml_backend_cann_buffer_get_tensor (llama/8871)

* cann: fix ggml_backend_cann_buffer_get_tensor

 1. fix data ptr offset
 2. enable the acquisition of incomplete tensors

* fix backend cann set_tensor

10 months agocann: fix buffer_num and runtime speed slowly error (llama/8865)
wangshuai09 [Mon, 5 Aug 2024 13:10:37 +0000 (21:10 +0800)]
cann: fix buffer_num and runtime speed slowly error (llama/8865)

10 months agoggml : fix overflows in elu function (llama/8866)
Justine Tunney [Mon, 5 Aug 2024 12:43:40 +0000 (05:43 -0700)]
ggml : fix overflows in elu function (llama/8866)

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.

10 months agovulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (llama/8855)
0cc4m [Mon, 5 Aug 2024 05:52:55 +0000 (07:52 +0200)]
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (llama/8855)

* Fix Vulkan mul mat vec invalid results when ncols < warp size

* Only run backend ops mul mat vec block size test if block size not already covered

10 months agocann: support q4_0 model (llama/8822)
wangshuai09 [Mon, 5 Aug 2024 04:22:30 +0000 (12:22 +0800)]
cann: support q4_0 model (llama/8822)

10 months agoggml : reading the runtime sve config of the cpu (llama/8709)
jdomke [Sat, 3 Aug 2024 16:34:41 +0000 (01:34 +0900)]
ggml : reading the runtime sve config of the cpu (llama/8709)

* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <redacted>
10 months agoFix conversion of unnormalized BF16->BF16 weights (llama/7843)
Sigbjørn Skjæret [Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)]
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>
10 months agocann: Fix ggml_cann_im2col for 1D im2col (llama/8819)
Mengqing Cao [Fri, 2 Aug 2024 08:50:53 +0000 (16:50 +0800)]
cann: Fix ggml_cann_im2col for 1D im2col (llama/8819)

* fix ggml_cann_im2col for 1D im2col

* fix build warning

10 months agoFixing wrong VDR iq4nl value (llama/8812)
Ouadie EL FAROUKI [Fri, 2 Aug 2024 00:55:17 +0000 (01:55 +0100)]
Fixing wrong VDR iq4nl value (llama/8812)

10 months agoggml-cuda: Adding support for unified memory (llama/8035)
matteo [Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)]
ggml-cuda: Adding support for unified memory (llama/8035)

* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <redacted>
* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <redacted>
Co-authored-by: Johannes Gäßler <redacted>
10 months agoBuild: Only include execinfo.h on linux systems that support it (llama/8783)
Alex O'Connell [Thu, 1 Aug 2024 16:53:46 +0000 (12:53 -0400)]
Build: Only include execinfo.h on linux systems that support it (llama/8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

10 months agocuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)
slaren [Thu, 1 Aug 2024 13:26:22 +0000 (15:26 +0200)]
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test

10 months agoadded android implementation of ggml_print_backtrace_symbols (llama/8751)
l3utterfly [Tue, 30 Jul 2024 14:40:18 +0000 (23:40 +0900)]
added android implementation of ggml_print_backtrace_symbols (llama/8751)

* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>
10 months agocann: update cmake (llama/8765)
wangshuai09 [Tue, 30 Jul 2024 10:37:35 +0000 (18:37 +0800)]
cann: update cmake (llama/8765)

10 months agoAdd `TIMESTEP_EMBEDDING` OP (llama/8707)
zhentaoyu [Tue, 30 Jul 2024 06:56:51 +0000 (14:56 +0800)]
Add `TIMESTEP_EMBEDDING` OP (llama/8707)

Signed-off-by: zhentaoyu <redacted>
10 months agoggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)
CarterLi999 [Mon, 29 Jul 2024 16:38:34 +0000 (00:38 +0800)]
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <redacted>
10 months agocuda : organize vendor-specific headers into vendors directory (llama/8746)
R0CKSTAR [Mon, 29 Jul 2024 12:56:12 +0000 (20:56 +0800)]
cuda : organize vendor-specific headers into vendors directory (llama/8746)

Signed-off-by: Xiaodong Ye <redacted>
10 months agoadd conv support (llama/8688)
Meng, Hengyu [Mon, 29 Jul 2024 02:50:27 +0000 (10:50 +0800)]
add conv support (llama/8688)

10 months agofeat: Support Moore Threads GPU (llama/8383)
R0CKSTAR [Sat, 27 Jul 2024 23:41:25 +0000 (07:41 +0800)]
feat: Support Moore Threads GPU (llama/8383)

* Update doc for MUSA

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <redacted>
* CUDA => MUSA

Signed-off-by: Xiaodong Ye <redacted>
* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <redacted>
* Fix CI build failure

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
10 months agoggml : ignore more msvc warnings (#906)
Borislav Stanimirov [Wed, 7 Aug 2024 07:00:56 +0000 (10:00 +0300)]
ggml : ignore more msvc warnings (#906)

10 months agometal : fix struct name (#912)
Georgi Gerganov [Wed, 7 Aug 2024 06:57:00 +0000 (09:57 +0300)]
metal : fix struct name (#912)

ggml-ci

10 months agometal : add abort callback (#905)
Conrad Kramer [Wed, 7 Aug 2024 06:55:49 +0000 (02:55 -0400)]
metal : add abort callback (#905)

10 months agovulkan : implement Stable Diffusion operators (#904)
0cc4m [Sun, 4 Aug 2024 15:28:08 +0000 (17:28 +0200)]
vulkan : implement Stable Diffusion operators (#904)

* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op

11 months agoggml : move c parameter comment to ggml_rope_ext (#901)
Daniel Bevenius [Mon, 29 Jul 2024 13:06:06 +0000 (15:06 +0200)]
ggml : move c parameter comment to ggml_rope_ext (#901)

This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <redacted>
11 months agoexamples: add TensorFlow to requirements.txt (#902)
Johannes Gäßler [Mon, 29 Jul 2024 13:03:08 +0000 (15:03 +0200)]
examples: add TensorFlow to requirements.txt (#902)

11 months agoggml : sync vulkan shaders (#0)
0cc4m [Sat, 27 Jul 2024 14:52:35 +0000 (17:52 +0300)]
ggml : sync vulkan shaders (#0)

ggml-ci

11 months agoggml : resolve sync conflicst (#0)
Georgi Gerganov [Sat, 27 Jul 2024 14:17:23 +0000 (17:17 +0300)]
ggml : resolve sync conflicst (#0)

ggml-ci

11 months agocommon : handle new quant types (#0)
Georgi Gerganov [Sat, 27 Jul 2024 14:17:04 +0000 (17:17 +0300)]
common : handle new quant types (#0)

11 months agoggml : add ggml-aarch64 (#0)
Dibakar Gope [Sat, 27 Jul 2024 14:16:40 +0000 (17:16 +0300)]
ggml : add ggml-aarch64 (#0)

11 months agocann: Fix Multi-NPU execution error (llama/8710)
wangshuai09 [Sat, 27 Jul 2024 08:36:44 +0000 (16:36 +0800)]
cann: Fix Multi-NPU execution error (llama/8710)

* cann: fix multi-npu exec error

* cann: update comment  for ggml_backend_cann_supports_buft

11 months agoggml : reduce hash table reset cost (llama/8698)
slaren [Sat, 27 Jul 2024 02:41:55 +0000 (04:41 +0200)]
ggml : reduce hash table reset cost (llama/8698)

* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string

11 months agoggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)
DavidKorczynski [Thu, 25 Jul 2024 21:23:05 +0000 (22:23 +0100)]
ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.

11 months agoggml : fix build on Windows with Snapdragon X (llama/8531)
Andreas (Andi) Kunar [Thu, 25 Jul 2024 16:01:00 +0000 (18:01 +0200)]
ggml : fix build on Windows with Snapdragon X (llama/8531)

* Improvements for Windows with Snapdragon X

* Revert "Improvements for Windows with Snapdragon X"

This reverts commit bf21397ae5ea7c73d3494db3b91505599909227d.

* Improvements for Windows with Snapdragon X

* WOA build clarifications

* WIndows on ARM build clarifications

* cmake build for Windows clarifications

* Update docs/build.md

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <redacted>
11 months agofix multi-gpu issue on sycl (llama/8554)
Chen Xi [Thu, 25 Jul 2024 11:45:18 +0000 (11:45 +0000)]
fix multi-gpu issue on sycl (llama/8554)

---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
11 months agoggml : add and use ggml_cpu_has_llamafile() (llama/8664)
Georgi Gerganov [Thu, 25 Jul 2024 09:37:42 +0000 (12:37 +0300)]
ggml : add and use ggml_cpu_has_llamafile() (llama/8664)

11 months agoRe-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)
Joe Todd [Wed, 24 Jul 2024 10:55:26 +0000 (11:55 +0100)]
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)

11 months agosycl : Add support for non-release DPC++ & oneMKL (llama/8644)
Joe Todd [Tue, 23 Jul 2024 13:58:37 +0000 (14:58 +0100)]
sycl : Add support for non-release DPC++ & oneMKL (llama/8644)

* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <redacted>
11 months agoVulkan IQ4_NL Support (llama/8613)
0cc4m [Tue, 23 Jul 2024 08:56:49 +0000 (10:56 +0200)]
Vulkan IQ4_NL Support (llama/8613)

* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

11 months agoAllow all RDNA2 archs to use sdot4 intrinsic (llama/8629)
Jeroen Mostert [Tue, 23 Jul 2024 08:50:40 +0000 (10:50 +0200)]
Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629)

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

11 months agofix scratch size of softmax (llama/8642)
luoyu-intel [Tue, 23 Jul 2024 07:43:28 +0000 (07:43 +0000)]
fix scratch size of softmax (llama/8642)

11 months agoggml: fix compile error for RISC-V (llama/8623)
Mark Zhuang [Mon, 22 Jul 2024 07:56:45 +0000 (15:56 +0800)]
ggml: fix compile error for RISC-V (llama/8623)

11 months agoCUDA: MMQ code deduplication + iquant support (llama/8495)
Johannes Gäßler [Sat, 20 Jul 2024 20:25:26 +0000 (22:25 +0200)]
CUDA: MMQ code deduplication + iquant support (llama/8495)

* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build

11 months agogguf : handle null name during init (llama/8587)
Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]
gguf : handle null name during init (llama/8587)

11 months agoggml : fix quant dot product with odd number of blocks (llama/8549)
slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]
ggml : fix quant dot product with odd number of blocks (llama/8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (llama/8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
11 months agoggml : add friendlier error message to fopen errors (llama/8575)
Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]
ggml : add friendlier error message to fopen errors (llama/8575)

* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.

11 months agoCUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)
Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]
CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)

11 months agocmake : install all ggml public headers (llama/8480)
65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]
cmake : install all ggml public headers (llama/8480)

Co-authored-by: 65a <redacted>
11 months agoAdd Ascend NPU backend (llama/6035)
hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]
Add Ascend NPU backend (llama/6035)

* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <redacted>
11 months agomake/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)
Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]
make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)

11 months agoRefactor lora adapter support (llama/8332)
Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]
Refactor lora adapter support (llama/8332)

* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>
11 months agoggml : suppress unknown pragma 'GCC' on windows (llama/8460)
Daniel Bevenius [Mon, 15 Jul 2024 12:48:17 +0000 (14:48 +0200)]
ggml : suppress unknown pragma 'GCC' on windows (llama/8460)

This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```

11 months agoadd concat through dim 1/2 (llama/8483)
Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]
add concat through dim 1/2 (llama/8483)

* add concat through dim 1/2

11 months agoVulkan MMQ Fix (llama/8479)
0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]
Vulkan MMQ Fix (llama/8479)

* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error

11 months agovulkan : cmake integration (llama/8119)
bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]
vulkan : cmake integration (llama/8119)

* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg

11 months agometal : template-ify some of the kernels (llama/8447)
Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]
metal : template-ify some of the kernels (llama/8447)

ggml-ci

11 months agoggml : minor naming changes (llama/8433)
Georgi Gerganov [Fri, 12 Jul 2024 07:46:02 +0000 (10:46 +0300)]
ggml : minor naming changes (llama/8433)

* ggml : minor naming changes

ggml-ci

* ggml : use PRId64 [no ci]

* ggml : revert FA K/Q names

11 months agofix the mul_mat_id ut issues (llama/8427)
Chen Xi [Fri, 12 Jul 2024 00:52:04 +0000 (00:52 +0000)]
fix the mul_mat_id ut issues (llama/8427)

* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <redacted>
---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Co-authored-by: Chen Xi <redacted>