git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

overview / pkg / ggml / sources / whisper.cpp / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

hsinhoyeh [Wed, 28 Aug 2024 14:09:01 +0000 (22:09 +0800)]

go : add beamsize/entropythold/maxcontext to context interface (#2350)

* feat(go binding): add beamsize/entropythold/maxcontext to context interface

fixes: #2349

* fix go building build

* fix dynamic link .so and header.h

* remove LD_LIBRARY_PATH

* remove ggml obj from whisper dynamic lib

* drop LIB_GGML

commit | commitdiff | tree

Georgi Gerganov [Wed, 28 Aug 2024 08:04:02 +0000 (11:04 +0300)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Wed, 28 Aug 2024 08:02:54 +0000 (11:02 +0300)]

whisper : update FA call

commit | commitdiff | tree

Georgi Gerganov [Wed, 28 Aug 2024 08:02:42 +0000 (11:02 +0300)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Tue, 27 Aug 2024 18:48:22 +0000 (21:48 +0300)]

sync : vulkan (skip) (llama/0)

commit | commitdiff | tree

slaren [Mon, 26 Aug 2024 17:44:43 +0000 (19:44 +0200)]

ggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192)

commit | commitdiff | tree

Georgi Gerganov [Mon, 26 Aug 2024 15:31:02 +0000 (18:31 +0300)]

metal : separate scale and mask from QKT in FA kernel (llama/9189)

* metal : separate scale and mask from QKT in FA kernel

* metal : ne01 check no longer necessary

* metal : keep data in local memory

commit | commitdiff | tree

Georgi Gerganov [Mon, 26 Aug 2024 14:55:36 +0000 (17:55 +0300)]

ggml : add SSM Metal kernels (llama/8546)

* ggml : add ggml_ssm_conv metal impl

* ggml : add ssm_scan metal impl

ggml-ci

commit | commitdiff | tree

slaren [Mon, 26 Aug 2024 09:08:59 +0000 (11:08 +0200)]

metal : gemma2 flash attention support (llama/9159)

commit | commitdiff | tree

Johannes Gäßler [Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)]

CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

commit | commitdiff | tree

Akarshan Biswas [Thu, 22 Aug 2024 14:09:47 +0000 (19:39 +0530)]

Add a space to supress a cmake warning (llama/9133)

commit | commitdiff | tree

luoyu-intel [Thu, 22 Aug 2024 04:50:10 +0000 (12:50 +0800)]

Add oneDNN primitive support (llama/9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

commit | commitdiff | tree

compilade [Wed, 21 Aug 2024 21:58:11 +0000 (17:58 -0400)]

llama : simplify Mamba with advanced batch splits (llama/8526)

* llama : advanced batch splits

This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.

* llama : always make recurrent state slots contiguous

* ggml : simplify mamba operators

* llama : fix integer signedness mixing

* llama : logits_all has priority over batch->logits

Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.

* llama : apply suggestions

Co-authored-by: Georgi Gerganov <redacted>
* llama : fix t5 segfault

* llama : fix Mamba session save and restore

* llama : minor cosmetic changes

* llama : rename llama_reorder_outputs to llama_output_reorder

Also move it closer to llama_output_reserve.

* llama : fix pooled embeddings when using batches with equal_seqs

* minor : add struct members for clarity

ggml-ci

* llama : fix T5 segfault again

* llama : fix Mamba pooled embeddings with multiple sequences

Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.

* llama : add llama_model_is_recurrent to simplify figuring that out

This will make it easier to more cleanly support RWKV-v6 and Mamba-2.

* llama : fix simple splits when the batch contains embeddings

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Meng, Hengyu [Tue, 20 Aug 2024 15:50:17 +0000 (23:50 +0800)]

fallback mmvq (llama/9088)

* fallback mmvq to mul_mat

* mmvq in cuda path

* Update ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <redacted>
---------

Co-authored-by: Alberto Cabrera Pérez <redacted>

commit | commitdiff | tree

zhentaoyu [Tue, 20 Aug 2024 15:06:51 +0000 (23:06 +0800)]

Fix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052)

* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert overflow

Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <redacted>
* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <redacted>
* sycl:refine convert

Signed-off-by: zhentaoyu <redacted>
* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <redacted>
* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <redacted>
* test: make new cases only in sycl

Signed-off-by: zhentaoyu <redacted>
* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <redacted>
---------

Signed-off-by: zhentaoyu <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Mon, 19 Aug 2024 07:11:45 +0000 (10:11 +0300)]

rpc : print error message when failed to connect endpoint (llama/9042)

commit | commitdiff | tree

Radoslav Gerganov [Mon, 19 Aug 2024 07:10:21 +0000 (10:10 +0300)]

rpc : prevent crashes on invalid input (llama/9040)

Add more checks which prevent RPC server from crashing if invalid input
is received from client

commit | commitdiff | tree

Nico Bosshard [Fri, 16 Aug 2024 02:22:55 +0000 (04:22 +0200)]

ggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)

* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes

commit | commitdiff | tree

Georgi Gerganov [Wed, 14 Aug 2024 06:14:49 +0000 (09:14 +0300)]

cmake : remove unused option GGML_CURL (llama/9011)

commit | commitdiff | tree

Daniel Bevenius [Tue, 13 Aug 2024 19:13:15 +0000 (21:13 +0200)]

ggml : move rope type enum to ggml.h (llama/8949)

* ggml : move rope type enum to ggml.h

This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.

The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.

Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.

* squash! ggml : move rope type enum to ggml.h

This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.

I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.

* squash! ggml : move rope type enum to ggml.h

This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.

* squash! ggml : move rope type enum to ggml.h

This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.

* squash! ggml : move rope type enum to ggml.h

This commit fixes the editorconfig-checker warnings.

* squash! ggml : move rope type enum to ggml.h

Update comment for ggml_rope function.

* Revert "squash! ggml : move rope type enum to ggml.h"

This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6.

* squash! ggml : move rope type enum to ggml.h

Add GGML_ROPE_TYPE_NEOX to rope_common.comp.

* remove extra line

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

DavidKorczynski [Mon, 12 Aug 2024 12:21:41 +0000 (13:21 +0100)]

ggml: fix div-by-zero (llama/9003)

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724
In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5

Signed-off-by: David Korczynski <redacted>

commit | commitdiff | tree

Markus Tavenrath [Sun, 11 Aug 2024 08:09:09 +0000 (10:09 +0200)]

Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (llama/8943)

* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 27 Aug 2024 18:39:30 +0000 (20:39 +0200)]

feat: ref. cross entropy, add CUDA, fix grad test (ggml/929)

commit | commitdiff | tree

Johannes Gäßler [Sat, 24 Aug 2024 17:27:02 +0000 (19:27 +0200)]

ggml: remove bad assert (ggml/928)

commit | commitdiff | tree

Johannes Gäßler [Tue, 30 Jul 2024 13:56:35 +0000 (15:56 +0200)]

examples: add MNIST training + missing ops

commit | commitdiff | tree

Brad Murray [Wed, 28 Aug 2024 08:46:01 +0000 (04:46 -0400)]

models : add support for wget2 for fedora (#2387)

commit | commitdiff | tree

Peng [Wed, 28 Aug 2024 08:45:05 +0000 (16:45 +0800)]

readme : update the path to bench.py (#2386)

commit | commitdiff | tree

Ivo von Putzer Reibegg [Wed, 28 Aug 2024 08:42:18 +0000 (10:42 +0200)]

readme : fix typo (#2383)

commit | commitdiff | tree

stormofice [Wed, 28 Aug 2024 08:41:51 +0000 (10:41 +0200)]

readme : fix broken links in implementation details section (#2382)

commit | commitdiff | tree

Georgi Gerganov [Wed, 28 Aug 2024 08:40:11 +0000 (11:40 +0300)]

whisper : fix compile warning for unused params

commit | commitdiff | tree

Georgi Gerganov [Tue, 20 Aug 2024 08:27:12 +0000 (11:27 +0300)]

sync : ggml vulkan (ggml/0)

ggml-ci

commit | commitdiff | tree

Radoslav Gerganov [Mon, 19 Aug 2024 07:09:33 +0000 (10:09 +0300)]

yolo : add backend support (ggml/924)

* yolo : add backend support

* metal : add sub and sqrt kernels

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Daniel Bevenius [Thu, 15 Aug 2024 07:42:38 +0000 (09:42 +0200)]

ggml : fix typo in ggml-quants.c comment (ggml/922)

commit | commitdiff | tree

Ronsor [Mon, 12 Aug 2024 13:02:08 +0000 (06:02 -0700)]

feat: add new `sin` and `cos` operators (ggml/919)

* ggml : add sin/cos operators

* ggml-cuda : add sin/cos operators

* ggml : add corresponding tests for sin/cos

* ggml : add backward computation for sin/cos operators

* ggml-vulkan : add sin/cos operators

* ggml-vulkan : add sin/cos shader source

* metal : add sin, cos

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Eric Curtin [Tue, 20 Aug 2024 07:57:45 +0000 (03:57 -0400)]

readme : fix broken links (#2358)

For whisper.cpp and whisper.h files

commit | commitdiff | tree

Justine Tunney [Tue, 20 Aug 2024 07:49:10 +0000 (00:49 -0700)]

examples : use colorblind friendly TTY color scheme (#2360)

This change updates the -pc flag, so that a new xterm256 color scheme is
used. This color scheme is believed to be better for three reasons:

1. It should be friendlier to the colorblind. The scheme was designed by
   Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it
   since 2017, so it's already popular in the machine learning community

2. It should appear to be the same colors as before to people who aren't
   i.e. it's still a red-green spectrum like before but lightly modified

3. It is readable in both white and black background terminals. The neon
   colors before were probably a bit too intense for white backgrounds.

commit | commitdiff | tree

Georgi Gerganov [Mon, 12 Aug 2024 08:59:15 +0000 (11:59 +0300)]

sync : ggml

commit | commitdiff | tree

Salvatore Mesoraca [Sun, 11 Aug 2024 08:08:53 +0000 (10:08 +0200)]

ggml : support forward pass broadcasting in ggml_sub (ggml/914)

* ggml: support forward pass broadcasting in ggml_sub

Signed-off-by: Salvatore Mesoraca <redacted>
* Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32

The check is already performed in ggml_sub_impl

Signed-off-by: Salvatore Mesoraca <redacted>
---------

Signed-off-by: Salvatore Mesoraca <redacted>

commit | commitdiff | tree

slaren [Sat, 10 Aug 2024 13:42:10 +0000 (15:42 +0200)]

metal : fix uninitialized abort_callback (llama/8968)

commit | commitdiff | tree

Georgi Gerganov [Fri, 9 Aug 2024 20:03:21 +0000 (23:03 +0300)]

rpc : sanitize tensor data + warnings (llama/0)

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Mengqing Cao [Fri, 9 Aug 2024 12:21:56 +0000 (20:21 +0800)]

cann : add Ascend NPU support (#2336)

* enable Ascend NPU in src/whisper.cpp
* sync test-backend-ops with llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 19:59:59 +0000 (22:59 +0300)]

whisper : fix compile warning (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 19:59:19 +0000 (22:59 +0300)]

sync : ggml

commit | commitdiff | tree

hipudding [Thu, 8 Aug 2024 11:48:06 +0000 (14:48 +0300)]

ggml : add CANN backend (llama/0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 19:58:13 +0000 (22:58 +0300)]

scripts : sync cann

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 17:35:21 +0000 (20:35 +0300)]

ci : disable ruby workflow (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 17:32:19 +0000 (20:32 +0300)]

ci : try to fix FreeBSD (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 11:27:16 +0000 (14:27 +0300)]

build : fix aarch64 (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 11:16:50 +0000 (14:16 +0300)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 11:10:06 +0000 (14:10 +0300)]

sync : ggml

commit | commitdiff | tree

slaren [Wed, 7 Aug 2024 11:29:02 +0000 (13:29 +0200)]

ggml-backend : fix async copy from CPU (llama/8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

commit | commitdiff | tree

Ouadie EL FAROUKI [Wed, 7 Aug 2024 10:25:36 +0000 (11:25 +0100)]

Updated SYCL device filtering (llama/8901)

* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme

commit | commitdiff | tree

Johannes Gäßler [Wed, 7 Aug 2024 07:07:52 +0000 (09:07 +0200)]

CUDA/HIP: fix tests/test-backend-ops (llama/8896)

commit | commitdiff | tree

Johannes Gäßler [Tue, 6 Aug 2024 15:13:55 +0000 (17:13 +0200)]

CUDA: fix padding logic for FP16/FP32 (llama/8884)

commit | commitdiff | tree

Molly Sophia [Tue, 6 Aug 2024 07:26:46 +0000 (15:26 +0800)]

ggml : add epsilon as a parameter for group_norm (llama/8818)

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

Justine Tunney [Mon, 5 Aug 2024 12:43:40 +0000 (05:43 -0700)]

ggml : fix overflows in elu function (llama/8866)

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.

commit | commitdiff | tree

jdomke [Sat, 3 Aug 2024 16:34:41 +0000 (01:34 +0900)]

ggml : reading the runtime sve config of the cpu (llama/8709)

* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <redacted>

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)]

Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>

commit | commitdiff | tree

Ouadie EL FAROUKI [Fri, 2 Aug 2024 00:55:17 +0000 (01:55 +0100)]

Fixing wrong VDR iq4nl value (llama/8812)

commit | commitdiff | tree

matteo [Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)]

ggml-cuda: Adding support for unified memory (llama/8035)

* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <redacted>
* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <redacted>
Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Alex O'Connell [Thu, 1 Aug 2024 16:53:46 +0000 (12:53 -0400)]

Build: Only include execinfo.h on linux systems that support it (llama/8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

commit | commitdiff | tree

slaren [Thu, 1 Aug 2024 13:26:22 +0000 (15:26 +0200)]

cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test

commit | commitdiff | tree

l3utterfly [Tue, 30 Jul 2024 14:40:18 +0000 (23:40 +0900)]

added android implementation of ggml_print_backtrace_symbols (llama/8751)

* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

wangshuai09 [Tue, 30 Jul 2024 10:37:35 +0000 (18:37 +0800)]

cann: update cmake (llama/8765)

commit | commitdiff | tree

zhentaoyu [Tue, 30 Jul 2024 06:56:51 +0000 (14:56 +0800)]

Add `TIMESTEP_EMBEDDING` OP (llama/8707)

Signed-off-by: zhentaoyu <redacted>

commit | commitdiff | tree

CarterLi999 [Mon, 29 Jul 2024 16:38:34 +0000 (00:38 +0800)]

ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <redacted>

commit | commitdiff | tree

R0CKSTAR [Mon, 29 Jul 2024 12:56:12 +0000 (20:56 +0800)]

cuda : organize vendor-specific headers into vendors directory (llama/8746)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Meng, Hengyu [Mon, 29 Jul 2024 02:50:27 +0000 (10:50 +0800)]

add conv support (llama/8688)

commit | commitdiff | tree

R0CKSTAR [Sat, 27 Jul 2024 23:41:25 +0000 (07:41 +0800)]

feat: Support Moore Threads GPU (llama/8383)

* Update doc for MUSA

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <redacted>
* CUDA => MUSA

Signed-off-by: Xiaodong Ye <redacted>
* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <redacted>
* Fix CI build failure

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Wed, 7 Aug 2024 07:00:56 +0000 (10:00 +0300)]

ggml : ignore more msvc warnings (ggml/906)

commit | commitdiff | tree

Georgi Gerganov [Wed, 7 Aug 2024 06:57:00 +0000 (09:57 +0300)]

metal : fix struct name (ggml/912)

ggml-ci

commit | commitdiff | tree

Conrad Kramer [Wed, 7 Aug 2024 06:55:49 +0000 (02:55 -0400)]

metal : add abort callback (ggml/905)

commit | commitdiff | tree

0cc4m [Sun, 4 Aug 2024 15:28:08 +0000 (17:28 +0200)]

vulkan : implement Stable Diffusion operators (ggml/904)

* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op

commit | commitdiff | tree

Daniel Bevenius [Mon, 29 Jul 2024 13:06:06 +0000 (15:06 +0200)]

ggml : move c parameter comment to ggml_rope_ext (ggml/901)

This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sat, 27 Jul 2024 14:17:23 +0000 (17:17 +0300)]

ggml : resolve sync conflicst (ggml/0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 27 Jul 2024 14:17:04 +0000 (17:17 +0300)]

common : handle new quant types (ggml/0)

commit | commitdiff | tree

Dibakar Gope [Sat, 27 Jul 2024 14:16:40 +0000 (17:16 +0300)]

ggml : add ggml-aarch64 (ggml/0)

commit | commitdiff | tree

slaren [Sat, 27 Jul 2024 02:41:55 +0000 (04:41 +0200)]

ggml : reduce hash table reset cost (llama/8698)

* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string

commit | commitdiff | tree

DavidKorczynski [Thu, 25 Jul 2024 21:23:05 +0000 (22:23 +0100)]

ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.

commit | commitdiff | tree

Chen Xi [Thu, 25 Jul 2024 11:45:18 +0000 (11:45 +0000)]

fix multi-gpu issue on sycl (llama/8554)

---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 25 Jul 2024 09:37:42 +0000 (12:37 +0300)]

ggml : add and use ggml_cpu_has_llamafile() (llama/8664)

commit | commitdiff | tree

Joe Todd [Wed, 24 Jul 2024 10:55:26 +0000 (11:55 +0100)]

Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)

commit | commitdiff | tree

Joe Todd [Tue, 23 Jul 2024 13:58:37 +0000 (14:58 +0100)]

sycl : Add support for non-release DPC++ & oneMKL (llama/8644)

* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <redacted>

commit | commitdiff | tree

0cc4m [Tue, 23 Jul 2024 08:56:49 +0000 (10:56 +0200)]

Vulkan IQ4_NL Support (llama/8613)

* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

commit | commitdiff | tree

Jeroen Mostert [Tue, 23 Jul 2024 08:50:40 +0000 (10:50 +0200)]

Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629)

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

commit | commitdiff | tree

luoyu-intel [Tue, 23 Jul 2024 07:43:28 +0000 (07:43 +0000)]

fix scratch size of softmax (llama/8642)

commit | commitdiff | tree

Mark Zhuang [Mon, 22 Jul 2024 07:56:45 +0000 (15:56 +0800)]

ggml: fix compile error for RISC-V (llama/8623)

commit | commitdiff | tree

Johannes Gäßler [Sat, 20 Jul 2024 20:25:26 +0000 (22:25 +0200)]

CUDA: MMQ code deduplication + iquant support (llama/8495)

* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build

commit | commitdiff | tree

Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]

gguf : handle null name during init (llama/8587)

commit | commitdiff | tree

slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]

ggml : fix quant dot product with odd number of blocks (llama/8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (llama/8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]

ggml : add friendlier error message to fopen errors (llama/8575)

* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.

commit | commitdiff | tree

Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]

CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)

commit | commitdiff | tree

65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]

cmake : install all ggml public headers (llama/8480)

Co-authored-by: 65a <redacted>

commit | commitdiff | tree

hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]

Add Ascend NPU backend (llama/6035)

* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]

make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)

commit | commitdiff | tree

Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]

Refactor lora adapter support (llama/8332)

* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>

commit | commitdiff | tree

Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]

add concat through dim 1/2 (llama/8483)

* add concat through dim 1/2

commit | commitdiff | tree

0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]

Vulkan MMQ Fix (llama/8479)

* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error

commit | commitdiff | tree

bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]

vulkan : cmake integration (llama/8119)

* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]

metal : template-ify some of the kernels (llama/8447)

ggml-ci

Packaging of ggerganov/whisper.cpp