]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log
pkg/ggml/sources/whisper.cpp
10 months agogo : add beamsize/entropythold/maxcontext to context interface (#2350)
hsinhoyeh [Wed, 28 Aug 2024 14:09:01 +0000 (22:09 +0800)]
go : add beamsize/entropythold/maxcontext to context interface (#2350)

* feat(go binding): add beamsize/entropythold/maxcontext to context interface

fixes: #2349

* fix go building build

* fix dynamic link .so and header.h

* remove LD_LIBRARY_PATH

* remove ggml obj from whisper dynamic lib

* drop LIB_GGML

10 months agotalk-llama : sync llama.cpp
Georgi Gerganov [Wed, 28 Aug 2024 08:04:02 +0000 (11:04 +0300)]
talk-llama : sync llama.cpp

10 months agowhisper : update FA call
Georgi Gerganov [Wed, 28 Aug 2024 08:02:54 +0000 (11:02 +0300)]
whisper : update FA call

10 months agosync : ggml
Georgi Gerganov [Wed, 28 Aug 2024 08:02:42 +0000 (11:02 +0300)]
sync : ggml

10 months agosync : vulkan (skip) (llama/0)
Georgi Gerganov [Tue, 27 Aug 2024 18:48:22 +0000 (21:48 +0300)]
sync : vulkan (skip) (llama/0)

10 months agoggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192)
slaren [Mon, 26 Aug 2024 17:44:43 +0000 (19:44 +0200)]
ggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192)

10 months agometal : separate scale and mask from QKT in FA kernel (llama/9189)
Georgi Gerganov [Mon, 26 Aug 2024 15:31:02 +0000 (18:31 +0300)]
metal : separate scale and mask from QKT in FA kernel (llama/9189)

* metal : separate scale and mask from QKT in FA kernel

* metal : ne01 check no longer necessary

* metal : keep data in local memory

10 months agoggml : add SSM Metal kernels (llama/8546)
Georgi Gerganov [Mon, 26 Aug 2024 14:55:36 +0000 (17:55 +0300)]
ggml : add SSM Metal kernels (llama/8546)

* ggml : add ggml_ssm_conv metal impl

* ggml : add ssm_scan metal impl

ggml-ci

10 months agometal : gemma2 flash attention support (llama/9159)
slaren [Mon, 26 Aug 2024 09:08:59 +0000 (11:08 +0200)]
metal : gemma2 flash attention support (llama/9159)

10 months agoCPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
Johannes Gäßler [Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)]
CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

10 months agoAdd a space to supress a cmake warning (llama/9133)
Akarshan Biswas [Thu, 22 Aug 2024 14:09:47 +0000 (19:39 +0530)]
Add a space to supress a cmake warning (llama/9133)

10 months agoAdd oneDNN primitive support (llama/9091)
luoyu-intel [Thu, 22 Aug 2024 04:50:10 +0000 (12:50 +0800)]
Add oneDNN primitive support (llama/9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

10 months agollama : simplify Mamba with advanced batch splits (llama/8526)
compilade [Wed, 21 Aug 2024 21:58:11 +0000 (17:58 -0400)]
llama : simplify Mamba with advanced batch splits (llama/8526)

* llama : advanced batch splits

This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.

* llama : always make recurrent state slots contiguous

* ggml : simplify mamba operators

* llama : fix integer signedness mixing

* llama : logits_all has priority over batch->logits

Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.

* llama : apply suggestions

Co-authored-by: Georgi Gerganov <redacted>
* llama : fix t5 segfault

* llama : fix Mamba session save and restore

* llama : minor cosmetic changes

* llama : rename llama_reorder_outputs to llama_output_reorder

Also move it closer to llama_output_reserve.

* llama : fix pooled embeddings when using batches with equal_seqs

* minor : add struct members for clarity

ggml-ci

* llama : fix T5 segfault again

* llama : fix Mamba pooled embeddings with multiple sequences

Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.

* llama : add llama_model_is_recurrent to simplify figuring that out

This will make it easier to more cleanly support RWKV-v6 and Mamba-2.

* llama : fix simple splits when the batch contains embeddings

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agofallback mmvq (llama/9088)
Meng, Hengyu [Tue, 20 Aug 2024 15:50:17 +0000 (23:50 +0800)]
fallback mmvq (llama/9088)

* fallback mmvq to mul_mat

* mmvq in cuda path

* Update ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <redacted>
---------

Co-authored-by: Alberto Cabrera Pérez <redacted>
10 months agoFix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052)
zhentaoyu [Tue, 20 Aug 2024 15:06:51 +0000 (23:06 +0800)]
Fix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052)

* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert overflow

Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <redacted>
* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <redacted>
* sycl:refine convert

Signed-off-by: zhentaoyu <redacted>
* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <redacted>
* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <redacted>
* test: make new cases only in sycl

Signed-off-by: zhentaoyu <redacted>
* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <redacted>
---------

Signed-off-by: zhentaoyu <redacted>
10 months agorpc : print error message when failed to connect endpoint (llama/9042)
Radoslav Gerganov [Mon, 19 Aug 2024 07:11:45 +0000 (10:11 +0300)]
rpc : print error message when failed to connect endpoint (llama/9042)

10 months agorpc : prevent crashes on invalid input (llama/9040)
Radoslav Gerganov [Mon, 19 Aug 2024 07:10:21 +0000 (10:10 +0300)]
rpc : prevent crashes on invalid input (llama/9040)

Add more checks which prevent RPC server from crashing if invalid input
is received from client

10 months agoggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)
Nico Bosshard [Fri, 16 Aug 2024 02:22:55 +0000 (04:22 +0200)]
ggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)

* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes

10 months agocmake : remove unused option GGML_CURL (llama/9011)
Georgi Gerganov [Wed, 14 Aug 2024 06:14:49 +0000 (09:14 +0300)]
cmake : remove unused option GGML_CURL (llama/9011)

10 months agoggml : move rope type enum to ggml.h (llama/8949)
Daniel Bevenius [Tue, 13 Aug 2024 19:13:15 +0000 (21:13 +0200)]
ggml : move rope type enum to ggml.h (llama/8949)

* ggml : move rope type enum to ggml.h

This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.

The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.

Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.

* squash! ggml : move rope type enum to ggml.h

This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.

I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.

* squash! ggml : move rope type enum to ggml.h

This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.

* squash! ggml : move rope type enum to ggml.h

This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.

* squash! ggml : move rope type enum to ggml.h

This commit fixes the editorconfig-checker warnings.

* squash! ggml : move rope type enum to ggml.h

Update comment for ggml_rope function.

* Revert "squash! ggml : move rope type enum to ggml.h"

This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6.

* squash! ggml : move rope type enum to ggml.h

Add GGML_ROPE_TYPE_NEOX to rope_common.comp.

* remove extra line

---------

Co-authored-by: slaren <redacted>
10 months agoggml: fix div-by-zero (llama/9003)
DavidKorczynski [Mon, 12 Aug 2024 12:21:41 +0000 (13:21 +0100)]
ggml: fix div-by-zero (llama/9003)

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724
In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5

Signed-off-by: David Korczynski <redacted>
10 months agoOptimize Vulkan backend for better CPU performance and less GPU synchronization overh...
Markus Tavenrath [Sun, 11 Aug 2024 08:09:09 +0000 (10:09 +0200)]
Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (llama/8943)

* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <redacted>
10 months agofeat: ref. cross entropy, add CUDA, fix grad test (ggml/929)
Johannes Gäßler [Tue, 27 Aug 2024 18:39:30 +0000 (20:39 +0200)]
feat: ref. cross entropy, add CUDA, fix grad test (ggml/929)

10 months agoggml: remove bad assert (ggml/928)
Johannes Gäßler [Sat, 24 Aug 2024 17:27:02 +0000 (19:27 +0200)]
ggml: remove bad assert (ggml/928)

10 months agoexamples: add MNIST training + missing ops
Johannes Gäßler [Tue, 30 Jul 2024 13:56:35 +0000 (15:56 +0200)]
examples: add MNIST training + missing ops

10 months agomodels : add support for wget2 for fedora (#2387)
Brad Murray [Wed, 28 Aug 2024 08:46:01 +0000 (04:46 -0400)]
models : add support for wget2 for fedora (#2387)

10 months agoreadme : update the path to bench.py (#2386)
Peng [Wed, 28 Aug 2024 08:45:05 +0000 (16:45 +0800)]
readme : update the path to bench.py (#2386)

10 months agoreadme : fix typo (#2383)
Ivo von Putzer Reibegg [Wed, 28 Aug 2024 08:42:18 +0000 (10:42 +0200)]
readme : fix typo (#2383)

10 months agoreadme : fix broken links in implementation details section (#2382)
stormofice [Wed, 28 Aug 2024 08:41:51 +0000 (10:41 +0200)]
readme : fix broken links in implementation details section (#2382)

10 months agowhisper : fix compile warning for unused params
Georgi Gerganov [Wed, 28 Aug 2024 08:40:11 +0000 (11:40 +0300)]
whisper : fix compile warning for unused params

10 months agosync : ggml vulkan (ggml/0)
Georgi Gerganov [Tue, 20 Aug 2024 08:27:12 +0000 (11:27 +0300)]
sync : ggml vulkan (ggml/0)

ggml-ci

10 months agoyolo : add backend support (ggml/924)
Radoslav Gerganov [Mon, 19 Aug 2024 07:09:33 +0000 (10:09 +0300)]
yolo : add backend support (ggml/924)

* yolo : add backend support

* metal : add sub and sqrt kernels

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agoggml : fix typo in ggml-quants.c comment (ggml/922)
Daniel Bevenius [Thu, 15 Aug 2024 07:42:38 +0000 (09:42 +0200)]
ggml : fix typo in ggml-quants.c comment (ggml/922)

10 months agofeat: add new `sin` and `cos` operators (ggml/919)
Ronsor [Mon, 12 Aug 2024 13:02:08 +0000 (06:02 -0700)]
feat: add new `sin` and `cos` operators (ggml/919)

* ggml : add sin/cos operators

* ggml-cuda : add sin/cos operators

* ggml : add corresponding tests for sin/cos

* ggml : add backward computation for sin/cos operators

* ggml-vulkan : add sin/cos operators

* ggml-vulkan : add sin/cos shader source

* metal : add sin, cos

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agoreadme : fix broken links (#2358)
Eric Curtin [Tue, 20 Aug 2024 07:57:45 +0000 (03:57 -0400)]
readme : fix broken links (#2358)

For whisper.cpp and whisper.h files

10 months agoexamples : use colorblind friendly TTY color scheme (#2360)
Justine Tunney [Tue, 20 Aug 2024 07:49:10 +0000 (00:49 -0700)]
examples : use colorblind friendly TTY color scheme (#2360)

This change updates the -pc flag, so that a new xterm256 color scheme is
used. This color scheme is believed to be better for three reasons:

1. It should be friendlier to the colorblind. The scheme was designed by
   Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it
   since 2017, so it's already popular in the machine learning community

2. It should appear to be the same colors as before to people who aren't
   i.e. it's still a red-green spectrum like before but lightly modified

3. It is readable in both white and black background terminals. The neon
   colors before were probably a bit too intense for white backgrounds.

10 months agosync : ggml
Georgi Gerganov [Mon, 12 Aug 2024 08:59:15 +0000 (11:59 +0300)]
sync : ggml

10 months agoggml : support forward pass broadcasting in ggml_sub (ggml/914)
Salvatore Mesoraca [Sun, 11 Aug 2024 08:08:53 +0000 (10:08 +0200)]
ggml : support forward pass broadcasting in ggml_sub (ggml/914)

* ggml: support forward pass broadcasting in ggml_sub

Signed-off-by: Salvatore Mesoraca <redacted>
* Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32

The check is already performed in ggml_sub_impl

Signed-off-by: Salvatore Mesoraca <redacted>
---------

Signed-off-by: Salvatore Mesoraca <redacted>
10 months agometal : fix uninitialized abort_callback (llama/8968)
slaren [Sat, 10 Aug 2024 13:42:10 +0000 (15:42 +0200)]
metal : fix uninitialized abort_callback (llama/8968)

10 months agorpc : sanitize tensor data + warnings (llama/0)
Georgi Gerganov [Fri, 9 Aug 2024 20:03:21 +0000 (23:03 +0300)]
rpc : sanitize tensor data + warnings (llama/0)

Co-authored-by: slaren <redacted>
10 months agocann : add Ascend NPU support (#2336)
Mengqing Cao [Fri, 9 Aug 2024 12:21:56 +0000 (20:21 +0800)]
cann : add Ascend NPU support (#2336)

* enable Ascend NPU in src/whisper.cpp
  * sync test-backend-ops with llama.cpp

10 months agowhisper : fix compile warning (#0)
Georgi Gerganov [Thu, 8 Aug 2024 19:59:59 +0000 (22:59 +0300)]
whisper : fix compile warning (#0)

10 months agosync : ggml
Georgi Gerganov [Thu, 8 Aug 2024 19:59:19 +0000 (22:59 +0300)]
sync : ggml

10 months agoggml : add CANN backend (llama/0)
hipudding [Thu, 8 Aug 2024 11:48:06 +0000 (14:48 +0300)]
ggml : add CANN backend (llama/0)

ggml-ci

10 months agoscripts : sync cann
Georgi Gerganov [Thu, 8 Aug 2024 19:58:13 +0000 (22:58 +0300)]
scripts : sync cann

10 months agoci : disable ruby workflow (#0)
Georgi Gerganov [Thu, 8 Aug 2024 17:35:21 +0000 (20:35 +0300)]
ci : disable ruby workflow (#0)

10 months agoci : try to fix FreeBSD (#0)
Georgi Gerganov [Thu, 8 Aug 2024 17:32:19 +0000 (20:32 +0300)]
ci : try to fix FreeBSD (#0)

10 months agobuild : fix aarch64 (#0)
Georgi Gerganov [Thu, 8 Aug 2024 11:27:16 +0000 (14:27 +0300)]
build : fix aarch64 (#0)

10 months agotalk-llama : sync llama.cpp
Georgi Gerganov [Thu, 8 Aug 2024 11:16:50 +0000 (14:16 +0300)]
talk-llama : sync llama.cpp

10 months agosync : ggml
Georgi Gerganov [Thu, 8 Aug 2024 11:10:06 +0000 (14:10 +0300)]
sync : ggml

10 months agoggml-backend : fix async copy from CPU (llama/8897)
slaren [Wed, 7 Aug 2024 11:29:02 +0000 (13:29 +0200)]
ggml-backend : fix async copy from CPU (llama/8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

10 months agoUpdated SYCL device filtering (llama/8901)
Ouadie EL FAROUKI [Wed, 7 Aug 2024 10:25:36 +0000 (11:25 +0100)]
Updated SYCL device filtering (llama/8901)

* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme

10 months agoCUDA/HIP: fix tests/test-backend-ops (llama/8896)
Johannes Gäßler [Wed, 7 Aug 2024 07:07:52 +0000 (09:07 +0200)]
CUDA/HIP: fix tests/test-backend-ops (llama/8896)

10 months agoCUDA: fix padding logic for FP16/FP32 (llama/8884)
Johannes Gäßler [Tue, 6 Aug 2024 15:13:55 +0000 (17:13 +0200)]
CUDA: fix padding logic for FP16/FP32 (llama/8884)

10 months agoggml : add epsilon as a parameter for group_norm (llama/8818)
Molly Sophia [Tue, 6 Aug 2024 07:26:46 +0000 (15:26 +0800)]
ggml : add epsilon as a parameter for group_norm (llama/8818)

Signed-off-by: Molly Sophia <redacted>
10 months agoggml : fix overflows in elu function (llama/8866)
Justine Tunney [Mon, 5 Aug 2024 12:43:40 +0000 (05:43 -0700)]
ggml : fix overflows in elu function (llama/8866)

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.

10 months agoggml : reading the runtime sve config of the cpu (llama/8709)
jdomke [Sat, 3 Aug 2024 16:34:41 +0000 (01:34 +0900)]
ggml : reading the runtime sve config of the cpu (llama/8709)

* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <redacted>
10 months agoFix conversion of unnormalized BF16->BF16 weights (llama/7843)
Sigbjørn Skjæret [Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)]
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>
10 months agoFixing wrong VDR iq4nl value (llama/8812)
Ouadie EL FAROUKI [Fri, 2 Aug 2024 00:55:17 +0000 (01:55 +0100)]
Fixing wrong VDR iq4nl value (llama/8812)

10 months agoggml-cuda: Adding support for unified memory (llama/8035)
matteo [Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)]
ggml-cuda: Adding support for unified memory (llama/8035)

* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <redacted>
* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <redacted>
Co-authored-by: Johannes Gäßler <redacted>
10 months agoBuild: Only include execinfo.h on linux systems that support it (llama/8783)
Alex O'Connell [Thu, 1 Aug 2024 16:53:46 +0000 (12:53 -0400)]
Build: Only include execinfo.h on linux systems that support it (llama/8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

10 months agocuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)
slaren [Thu, 1 Aug 2024 13:26:22 +0000 (15:26 +0200)]
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test

10 months agoadded android implementation of ggml_print_backtrace_symbols (llama/8751)
l3utterfly [Tue, 30 Jul 2024 14:40:18 +0000 (23:40 +0900)]
added android implementation of ggml_print_backtrace_symbols (llama/8751)

* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>
10 months agocann: update cmake (llama/8765)
wangshuai09 [Tue, 30 Jul 2024 10:37:35 +0000 (18:37 +0800)]
cann: update cmake (llama/8765)

10 months agoAdd `TIMESTEP_EMBEDDING` OP (llama/8707)
zhentaoyu [Tue, 30 Jul 2024 06:56:51 +0000 (14:56 +0800)]
Add `TIMESTEP_EMBEDDING` OP (llama/8707)

Signed-off-by: zhentaoyu <redacted>
10 months agoggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)
CarterLi999 [Mon, 29 Jul 2024 16:38:34 +0000 (00:38 +0800)]
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <redacted>
10 months agocuda : organize vendor-specific headers into vendors directory (llama/8746)
R0CKSTAR [Mon, 29 Jul 2024 12:56:12 +0000 (20:56 +0800)]
cuda : organize vendor-specific headers into vendors directory (llama/8746)

Signed-off-by: Xiaodong Ye <redacted>
10 months agoadd conv support (llama/8688)
Meng, Hengyu [Mon, 29 Jul 2024 02:50:27 +0000 (10:50 +0800)]
add conv support (llama/8688)

10 months agofeat: Support Moore Threads GPU (llama/8383)
R0CKSTAR [Sat, 27 Jul 2024 23:41:25 +0000 (07:41 +0800)]
feat: Support Moore Threads GPU (llama/8383)

* Update doc for MUSA

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <redacted>
* CUDA => MUSA

Signed-off-by: Xiaodong Ye <redacted>
* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <redacted>
* Fix CI build failure

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
10 months agoggml : ignore more msvc warnings (ggml/906)
Borislav Stanimirov [Wed, 7 Aug 2024 07:00:56 +0000 (10:00 +0300)]
ggml : ignore more msvc warnings (ggml/906)

10 months agometal : fix struct name (ggml/912)
Georgi Gerganov [Wed, 7 Aug 2024 06:57:00 +0000 (09:57 +0300)]
metal : fix struct name (ggml/912)

ggml-ci

10 months agometal : add abort callback (ggml/905)
Conrad Kramer [Wed, 7 Aug 2024 06:55:49 +0000 (02:55 -0400)]
metal : add abort callback (ggml/905)

10 months agovulkan : implement Stable Diffusion operators (ggml/904)
0cc4m [Sun, 4 Aug 2024 15:28:08 +0000 (17:28 +0200)]
vulkan : implement Stable Diffusion operators (ggml/904)

* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op

10 months agoggml : move c parameter comment to ggml_rope_ext (ggml/901)
Daniel Bevenius [Mon, 29 Jul 2024 13:06:06 +0000 (15:06 +0200)]
ggml : move c parameter comment to ggml_rope_ext (ggml/901)

This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <redacted>
10 months agoggml : resolve sync conflicst (ggml/0)
Georgi Gerganov [Sat, 27 Jul 2024 14:17:23 +0000 (17:17 +0300)]
ggml : resolve sync conflicst (ggml/0)

ggml-ci

10 months agocommon : handle new quant types (ggml/0)
Georgi Gerganov [Sat, 27 Jul 2024 14:17:04 +0000 (17:17 +0300)]
common : handle new quant types (ggml/0)

10 months agoggml : add ggml-aarch64 (ggml/0)
Dibakar Gope [Sat, 27 Jul 2024 14:16:40 +0000 (17:16 +0300)]
ggml : add ggml-aarch64 (ggml/0)

10 months agoggml : reduce hash table reset cost (llama/8698)
slaren [Sat, 27 Jul 2024 02:41:55 +0000 (04:41 +0200)]
ggml : reduce hash table reset cost (llama/8698)

* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string

10 months agoggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)
DavidKorczynski [Thu, 25 Jul 2024 21:23:05 +0000 (22:23 +0100)]
ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.

10 months agofix multi-gpu issue on sycl (llama/8554)
Chen Xi [Thu, 25 Jul 2024 11:45:18 +0000 (11:45 +0000)]
fix multi-gpu issue on sycl (llama/8554)

---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
10 months agoggml : add and use ggml_cpu_has_llamafile() (llama/8664)
Georgi Gerganov [Thu, 25 Jul 2024 09:37:42 +0000 (12:37 +0300)]
ggml : add and use ggml_cpu_has_llamafile() (llama/8664)

10 months agoRe-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)
Joe Todd [Wed, 24 Jul 2024 10:55:26 +0000 (11:55 +0100)]
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)

10 months agosycl : Add support for non-release DPC++ & oneMKL (llama/8644)
Joe Todd [Tue, 23 Jul 2024 13:58:37 +0000 (14:58 +0100)]
sycl : Add support for non-release DPC++ & oneMKL (llama/8644)

* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <redacted>
10 months agoVulkan IQ4_NL Support (llama/8613)
0cc4m [Tue, 23 Jul 2024 08:56:49 +0000 (10:56 +0200)]
Vulkan IQ4_NL Support (llama/8613)

* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

10 months agoAllow all RDNA2 archs to use sdot4 intrinsic (llama/8629)
Jeroen Mostert [Tue, 23 Jul 2024 08:50:40 +0000 (10:50 +0200)]
Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629)

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

10 months agofix scratch size of softmax (llama/8642)
luoyu-intel [Tue, 23 Jul 2024 07:43:28 +0000 (07:43 +0000)]
fix scratch size of softmax (llama/8642)

10 months agoggml: fix compile error for RISC-V (llama/8623)
Mark Zhuang [Mon, 22 Jul 2024 07:56:45 +0000 (15:56 +0800)]
ggml: fix compile error for RISC-V (llama/8623)

10 months agoCUDA: MMQ code deduplication + iquant support (llama/8495)
Johannes Gäßler [Sat, 20 Jul 2024 20:25:26 +0000 (22:25 +0200)]
CUDA: MMQ code deduplication + iquant support (llama/8495)

* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build

10 months agogguf : handle null name during init (llama/8587)
Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]
gguf : handle null name during init (llama/8587)

10 months agoggml : fix quant dot product with odd number of blocks (llama/8549)
slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]
ggml : fix quant dot product with odd number of blocks (llama/8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (llama/8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agoggml : add friendlier error message to fopen errors (llama/8575)
Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]
ggml : add friendlier error message to fopen errors (llama/8575)

* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.

10 months agoCUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)
Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]
CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)

10 months agocmake : install all ggml public headers (llama/8480)
65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]
cmake : install all ggml public headers (llama/8480)

Co-authored-by: 65a <redacted>
10 months agoAdd Ascend NPU backend (llama/6035)
hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]
Add Ascend NPU backend (llama/6035)

* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <redacted>
10 months agomake/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)
Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]
make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)

10 months agoRefactor lora adapter support (llama/8332)
Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]
Refactor lora adapter support (llama/8332)

* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>
10 months agoadd concat through dim 1/2 (llama/8483)
Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]
add concat through dim 1/2 (llama/8483)

* add concat through dim 1/2

10 months agoVulkan MMQ Fix (llama/8479)
0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]
Vulkan MMQ Fix (llama/8479)

* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error

10 months agovulkan : cmake integration (llama/8119)
bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]
vulkan : cmake integration (llama/8119)

* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg

10 months agometal : template-ify some of the kernels (llama/8447)
Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]
metal : template-ify some of the kernels (llama/8447)

ggml-ci