git.djapps.eu Git - pkg/ggml/sources/ggml/log

]> git.djapps.eu Git - pkg/ggml/sources/ggml/log

overview / pkg / ggml / sources / ggml / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Markus Tavenrath [Sun, 11 Aug 2024 08:09:09 +0000 (10:09 +0200)]

Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (llama/8943)

* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 27 Aug 2024 18:39:30 +0000 (20:39 +0200)]

feat: ref. cross entropy, add CUDA, fix grad test (#929)

commit | commitdiff | tree

Salvatore Mesoraca [Tue, 27 Aug 2024 06:25:12 +0000 (08:25 +0200)]

tests : fix memory leaks (#936)

It is annoying to run the tests using the sanitizers
because of all the uninteresting reports about the memory
leaked by the tests themselves.

Signed-off-by: Salvatore Mesoraca <redacted>

commit | commitdiff | tree

Johannes Gäßler [Sat, 24 Aug 2024 17:27:02 +0000 (19:27 +0200)]

ggml: remove bad assert (#928)

commit | commitdiff | tree

ucag.li [Thu, 22 Aug 2024 19:49:45 +0000 (03:49 +0800)]

set NULL to ggml_context pointer to pass assert check in in case some compiler does not set uninitialized pointer to NULL for mnist example

commit | commitdiff | tree

Johannes Gäßler [Tue, 30 Jul 2024 13:56:35 +0000 (15:56 +0200)]

examples: add MNIST training + missing ops

commit | commitdiff | tree

Radoslav Gerganov [Mon, 19 Aug 2024 07:09:33 +0000 (10:09 +0300)]

yolo : add backend support (#924)

* yolo : add backend support

* metal : add sub and sqrt kernels

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Daniel Bevenius [Thu, 15 Aug 2024 07:42:38 +0000 (09:42 +0200)]

ggml : fix typo in ggml-quants.c comment (#922)

commit | commitdiff | tree

Ronsor [Mon, 12 Aug 2024 13:02:08 +0000 (06:02 -0700)]

feat: add new `sin` and `cos` operators (#919)

* ggml : add sin/cos operators

* ggml-cuda : add sin/cos operators

* ggml : add corresponding tests for sin/cos

* ggml : add backward computation for sin/cos operators

* ggml-vulkan : add sin/cos operators

* ggml-vulkan : add sin/cos shader source

* metal : add sin, cos

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Salvatore Mesoraca [Sun, 11 Aug 2024 08:08:53 +0000 (10:08 +0200)]

ggml : support forward pass broadcasting in ggml_sub (#914)

* ggml: support forward pass broadcasting in ggml_sub

Signed-off-by: Salvatore Mesoraca <redacted>
* Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32

The check is already performed in ggml_sub_impl

Signed-off-by: Salvatore Mesoraca <redacted>
---------

Signed-off-by: Salvatore Mesoraca <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 11 Aug 2024 08:06:21 +0000 (11:06 +0300)]

sync : llama.cpp

commit | commitdiff | tree

slaren [Sat, 10 Aug 2024 13:42:10 +0000 (15:42 +0200)]

metal : fix uninitialized abort_callback (llama/8968)

commit | commitdiff | tree

Georgi Gerganov [Sat, 10 Aug 2024 06:51:19 +0000 (09:51 +0300)]

sync : llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Fri, 9 Aug 2024 20:03:21 +0000 (23:03 +0300)]

rpc : sanitize tensor data + warnings (llama/0)

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 9 Aug 2024 07:03:29 +0000 (10:03 +0300)]

sync : whisper.cpp

commit | commitdiff | tree

Matt Stephenson [Tue, 16 Jul 2024 07:21:09 +0000 (03:21 -0400)]

whisper : use vulkan as gpu backend when available (whisper/2302)

* ggml: use vulkan as gpu backend when available

Signed-off-by: Matt Stephenson <redacted>
* whisper: enable using vk as default buffer type

Signed-off-by: Matt Stephenson <redacted>
---------

Signed-off-by: Matt Stephenson <redacted>

commit | commitdiff | tree

hipudding [Thu, 8 Aug 2024 11:48:06 +0000 (14:48 +0300)]

ggml : add CANN backend (llama/0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 11:46:24 +0000 (14:46 +0300)]

sync : vulkan (llama/0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 10:57:33 +0000 (13:57 +0300)]

scripts : sync sycl (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 10:54:39 +0000 (13:54 +0300)]

scripts : remove obsolete header (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 10:51:09 +0000 (13:51 +0300)]

scripts : update sync scripts (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 10:25:41 +0000 (13:25 +0300)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

slaren [Wed, 7 Aug 2024 11:29:02 +0000 (13:29 +0200)]

ggml-backend : fix async copy from CPU (llama/8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

commit | commitdiff | tree

Ouadie EL FAROUKI [Wed, 7 Aug 2024 10:25:36 +0000 (11:25 +0100)]

Updated SYCL device filtering (llama/8901)

* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme

commit | commitdiff | tree

Johannes Gäßler [Wed, 7 Aug 2024 07:07:52 +0000 (09:07 +0200)]

CUDA/HIP: fix tests/test-backend-ops (llama/8896)

commit | commitdiff | tree

Johannes Gäßler [Tue, 6 Aug 2024 15:13:55 +0000 (17:13 +0200)]

CUDA: fix padding logic for FP16/FP32 (llama/8884)

commit | commitdiff | tree

Molly Sophia [Tue, 6 Aug 2024 07:26:46 +0000 (15:26 +0800)]

ggml : add epsilon as a parameter for group_norm (llama/8818)

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

Mengqing Cao [Tue, 6 Aug 2024 04:42:42 +0000 (12:42 +0800)]

Fix ggml_backend_cann_buffer_get_tensor (llama/8871)

* cann: fix ggml_backend_cann_buffer_get_tensor

1. fix data ptr offset
2. enable the acquisition of incomplete tensors

* fix backend cann set_tensor

commit | commitdiff | tree

wangshuai09 [Mon, 5 Aug 2024 13:10:37 +0000 (21:10 +0800)]

cann: fix buffer_num and runtime speed slowly error (llama/8865)

commit | commitdiff | tree

Justine Tunney [Mon, 5 Aug 2024 12:43:40 +0000 (05:43 -0700)]

ggml : fix overflows in elu function (llama/8866)

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.

commit | commitdiff | tree

0cc4m [Mon, 5 Aug 2024 05:52:55 +0000 (07:52 +0200)]

vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (llama/8855)

* Fix Vulkan mul mat vec invalid results when ncols < warp size

* Only run backend ops mul mat vec block size test if block size not already covered

commit | commitdiff | tree

wangshuai09 [Mon, 5 Aug 2024 04:22:30 +0000 (12:22 +0800)]

cann: support q4_0 model (llama/8822)

commit | commitdiff | tree

jdomke [Sat, 3 Aug 2024 16:34:41 +0000 (01:34 +0900)]

ggml : reading the runtime sve config of the cpu (llama/8709)

* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <redacted>

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)]

Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>

commit | commitdiff | tree

Mengqing Cao [Fri, 2 Aug 2024 08:50:53 +0000 (16:50 +0800)]

cann: Fix ggml_cann_im2col for 1D im2col (llama/8819)

* fix ggml_cann_im2col for 1D im2col

* fix build warning

commit | commitdiff | tree

Ouadie EL FAROUKI [Fri, 2 Aug 2024 00:55:17 +0000 (01:55 +0100)]

Fixing wrong VDR iq4nl value (llama/8812)

commit | commitdiff | tree

matteo [Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)]

ggml-cuda: Adding support for unified memory (llama/8035)

* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <redacted>
* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <redacted>
Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Alex O'Connell [Thu, 1 Aug 2024 16:53:46 +0000 (12:53 -0400)]

Build: Only include execinfo.h on linux systems that support it (llama/8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

commit | commitdiff | tree

slaren [Thu, 1 Aug 2024 13:26:22 +0000 (15:26 +0200)]

cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test

commit | commitdiff | tree

l3utterfly [Tue, 30 Jul 2024 14:40:18 +0000 (23:40 +0900)]

added android implementation of ggml_print_backtrace_symbols (llama/8751)

* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

wangshuai09 [Tue, 30 Jul 2024 10:37:35 +0000 (18:37 +0800)]

cann: update cmake (llama/8765)

commit | commitdiff | tree

zhentaoyu [Tue, 30 Jul 2024 06:56:51 +0000 (14:56 +0800)]

Add `TIMESTEP_EMBEDDING` OP (llama/8707)

Signed-off-by: zhentaoyu <redacted>

commit | commitdiff | tree

CarterLi999 [Mon, 29 Jul 2024 16:38:34 +0000 (00:38 +0800)]

ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <redacted>

commit | commitdiff | tree

R0CKSTAR [Mon, 29 Jul 2024 12:56:12 +0000 (20:56 +0800)]

cuda : organize vendor-specific headers into vendors directory (llama/8746)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Meng, Hengyu [Mon, 29 Jul 2024 02:50:27 +0000 (10:50 +0800)]

add conv support (llama/8688)

commit | commitdiff | tree

R0CKSTAR [Sat, 27 Jul 2024 23:41:25 +0000 (07:41 +0800)]

feat: Support Moore Threads GPU (llama/8383)

* Update doc for MUSA

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <redacted>
* CUDA => MUSA

Signed-off-by: Xiaodong Ye <redacted>
* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <redacted>
* Fix CI build failure

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Wed, 7 Aug 2024 07:00:56 +0000 (10:00 +0300)]

ggml : ignore more msvc warnings (#906)

commit | commitdiff | tree

Georgi Gerganov [Wed, 7 Aug 2024 06:57:00 +0000 (09:57 +0300)]

metal : fix struct name (#912)

ggml-ci

commit | commitdiff | tree

Conrad Kramer [Wed, 7 Aug 2024 06:55:49 +0000 (02:55 -0400)]

metal : add abort callback (#905)

commit | commitdiff | tree

0cc4m [Sun, 4 Aug 2024 15:28:08 +0000 (17:28 +0200)]

vulkan : implement Stable Diffusion operators (#904)

* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op

commit | commitdiff | tree

Daniel Bevenius [Mon, 29 Jul 2024 13:06:06 +0000 (15:06 +0200)]

ggml : move c parameter comment to ggml_rope_ext (#901)

This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Johannes Gäßler [Mon, 29 Jul 2024 13:03:08 +0000 (15:03 +0200)]

examples: add TensorFlow to requirements.txt (#902)

commit | commitdiff | tree

0cc4m [Sat, 27 Jul 2024 14:52:35 +0000 (17:52 +0300)]

ggml : sync vulkan shaders (#0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 27 Jul 2024 14:17:23 +0000 (17:17 +0300)]

ggml : resolve sync conflicst (#0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 27 Jul 2024 14:17:04 +0000 (17:17 +0300)]

common : handle new quant types (#0)

commit | commitdiff | tree

Dibakar Gope [Sat, 27 Jul 2024 14:16:40 +0000 (17:16 +0300)]

ggml : add ggml-aarch64 (#0)

commit | commitdiff | tree

wangshuai09 [Sat, 27 Jul 2024 08:36:44 +0000 (16:36 +0800)]

cann: Fix Multi-NPU execution error (llama/8710)

* cann: fix multi-npu exec error

* cann: update comment for ggml_backend_cann_supports_buft

commit | commitdiff | tree

slaren [Sat, 27 Jul 2024 02:41:55 +0000 (04:41 +0200)]

ggml : reduce hash table reset cost (llama/8698)

* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string

commit | commitdiff | tree

DavidKorczynski [Thu, 25 Jul 2024 21:23:05 +0000 (22:23 +0100)]

ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.

commit | commitdiff | tree

Andreas (Andi) Kunar [Thu, 25 Jul 2024 16:01:00 +0000 (18:01 +0200)]

ggml : fix build on Windows with Snapdragon X (llama/8531)

* Improvements for Windows with Snapdragon X

* Revert "Improvements for Windows with Snapdragon X"

This reverts commit bf21397ae5ea7c73d3494db3b91505599909227d.

* Improvements for Windows with Snapdragon X

* WOA build clarifications

* WIndows on ARM build clarifications

* cmake build for Windows clarifications

* Update docs/build.md

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Chen Xi [Thu, 25 Jul 2024 11:45:18 +0000 (11:45 +0000)]

fix multi-gpu issue on sycl (llama/8554)

---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 25 Jul 2024 09:37:42 +0000 (12:37 +0300)]

ggml : add and use ggml_cpu_has_llamafile() (llama/8664)

commit | commitdiff | tree

Joe Todd [Wed, 24 Jul 2024 10:55:26 +0000 (11:55 +0100)]

Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)

commit | commitdiff | tree

Joe Todd [Tue, 23 Jul 2024 13:58:37 +0000 (14:58 +0100)]

sycl : Add support for non-release DPC++ & oneMKL (llama/8644)

* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <redacted>

commit | commitdiff | tree

0cc4m [Tue, 23 Jul 2024 08:56:49 +0000 (10:56 +0200)]

Vulkan IQ4_NL Support (llama/8613)

* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

commit | commitdiff | tree

Jeroen Mostert [Tue, 23 Jul 2024 08:50:40 +0000 (10:50 +0200)]

Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629)

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

commit | commitdiff | tree

luoyu-intel [Tue, 23 Jul 2024 07:43:28 +0000 (07:43 +0000)]

fix scratch size of softmax (llama/8642)

commit | commitdiff | tree

Mark Zhuang [Mon, 22 Jul 2024 07:56:45 +0000 (15:56 +0800)]

ggml: fix compile error for RISC-V (llama/8623)

commit | commitdiff | tree

Johannes Gäßler [Sat, 20 Jul 2024 20:25:26 +0000 (22:25 +0200)]

CUDA: MMQ code deduplication + iquant support (llama/8495)

* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build

commit | commitdiff | tree

Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]

gguf : handle null name during init (llama/8587)

commit | commitdiff | tree

slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]

ggml : fix quant dot product with odd number of blocks (llama/8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (llama/8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]

ggml : add friendlier error message to fopen errors (llama/8575)

* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.

commit | commitdiff | tree

Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]

CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)

commit | commitdiff | tree

65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]

cmake : install all ggml public headers (llama/8480)

Co-authored-by: 65a <redacted>

commit | commitdiff | tree

hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]

Add Ascend NPU backend (llama/6035)

* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]

make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)

commit | commitdiff | tree

Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]

Refactor lora adapter support (llama/8332)

* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>

commit | commitdiff | tree

Daniel Bevenius [Mon, 15 Jul 2024 12:48:17 +0000 (14:48 +0200)]

ggml : suppress unknown pragma 'GCC' on windows (llama/8460)

This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```

commit | commitdiff | tree

Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]

add concat through dim 1/2 (llama/8483)

* add concat through dim 1/2

commit | commitdiff | tree

0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]

Vulkan MMQ Fix (llama/8479)

* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error

commit | commitdiff | tree

bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]

vulkan : cmake integration (llama/8119)

* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]

metal : template-ify some of the kernels (llama/8447)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jul 2024 07:46:02 +0000 (10:46 +0300)]

ggml : minor naming changes (llama/8433)

* ggml : minor naming changes

ggml-ci

* ggml : use PRId64 [no ci]

* ggml : revert FA K/Q names

commit | commitdiff | tree

Chen Xi [Fri, 12 Jul 2024 00:52:04 +0000 (00:52 +0000)]

fix the mul_mat_id ut issues (llama/8427)

* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <redacted>
---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Co-authored-by: Chen Xi <redacted>

commit | commitdiff | tree

Nicholai Tukanov [Thu, 11 Jul 2024 16:49:15 +0000 (11:49 -0500)]

ggml : add NVPL BLAS support (#8329) (llama/8425)

* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <redacted>

commit | commitdiff | tree

Daniel Bevenius [Thu, 11 Jul 2024 15:53:42 +0000 (17:53 +0200)]

cuda : suppress 'noreturn' warn in no_device_code (llama/8414)

* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
346 | }
| ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <redacted>
* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <redacted>
---------

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Johannes Gäßler [Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)]

CUDA: optimize and refactor MMQ (llama/8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

commit | commitdiff | tree

AidanBeltonS [Wed, 10 Jul 2024 15:10:49 +0000 (16:10 +0100)]

Use multi_ptr to clean up deprecated warnings (llama/8256)

commit | commitdiff | tree

Georgi Gerganov [Wed, 10 Jul 2024 12:23:29 +0000 (15:23 +0300)]

ggml : move sgemm sources to llamafile subfolder (llama/8394)

ggml-ci

commit | commitdiff | tree

Dibakar Gope [Wed, 10 Jul 2024 12:14:51 +0000 (07:14 -0500)]

ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780)

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

commit | commitdiff | tree

Alberto Cabrera Pérez [Tue, 9 Jul 2024 14:03:15 +0000 (15:03 +0100)]

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372)

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment

commit | commitdiff | tree

Alberto Cabrera Pérez [Mon, 8 Jul 2024 13:22:41 +0000 (14:22 +0100)]

sycl : fix powf call in device code (llama/8368)

commit | commitdiff | tree

Mahesh Madhav [Thu, 25 Jul 2024 07:54:08 +0000 (00:54 -0700)]

ggml : loop tiling optimizations for scalar path (#898)

Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.

commit | commitdiff | tree

Ivan Filipov [Mon, 22 Jul 2024 11:32:02 +0000 (14:32 +0300)]

ggml: add support for float16 input tensors in pooling operations (#895)

* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <redacted>

commit | commitdiff | tree

Brian [Mon, 22 Jul 2024 10:25:01 +0000 (20:25 +1000)]

gguf.md: naming convention synced to llama.cpp (#896)

It is now updated to this form

`<BaseName><SizeLabel><FineTune><Version><Encoding><Type><Shard>.gguf`

commit | commitdiff | tree

Brian [Sun, 21 Jul 2024 08:20:30 +0000 (18:20 +1000)]

gguf.md: kv store has new authorship metadata keys (#897)

commit | commitdiff | tree

Tony Wasserka [Sat, 20 Jul 2024 18:49:44 +0000 (20:49 +0200)]

vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (#893)

This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sat, 20 Jul 2024 13:38:56 +0000 (16:38 +0300)]

py : update pacakges + fix yolo warning

commit | commitdiff | tree

Borislav Stanimirov [Fri, 12 Jul 2024 14:24:20 +0000 (17:24 +0300)]

cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (#885)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 11:54:35 +0000 (14:54 +0300)]

sync : whisper.cpp

Packaging of ggml-org/ggml