git.djapps.eu Git - pkg/ggml/sources/ggml/log

]> git.djapps.eu Git - pkg/ggml/sources/ggml/log

overview / pkg / ggml / sources / ggml / log

commit | commitdiff | tree

Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]

gguf : handle null name during init (llama/8587)

commit | commitdiff | tree

slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]

ggml : fix quant dot product with odd number of blocks (llama/8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (llama/8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]

ggml : add friendlier error message to fopen errors (llama/8575)

* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.

commit | commitdiff | tree

Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]

CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)

commit | commitdiff | tree

65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]

cmake : install all ggml public headers (llama/8480)

Co-authored-by: 65a <redacted>

commit | commitdiff | tree

hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]

Add Ascend NPU backend (llama/6035)

* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]

make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)

commit | commitdiff | tree

Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]

Refactor lora adapter support (llama/8332)

* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>

commit | commitdiff | tree

Daniel Bevenius [Mon, 15 Jul 2024 12:48:17 +0000 (14:48 +0200)]

ggml : suppress unknown pragma 'GCC' on windows (llama/8460)

This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```

commit | commitdiff | tree

Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]

add concat through dim 1/2 (llama/8483)

* add concat through dim 1/2

commit | commitdiff | tree

0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]

Vulkan MMQ Fix (llama/8479)

* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error

commit | commitdiff | tree

bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]

vulkan : cmake integration (llama/8119)

* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]

metal : template-ify some of the kernels (llama/8447)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jul 2024 07:46:02 +0000 (10:46 +0300)]

ggml : minor naming changes (llama/8433)

* ggml : minor naming changes

ggml-ci

* ggml : use PRId64 [no ci]

* ggml : revert FA K/Q names

commit | commitdiff | tree

Chen Xi [Fri, 12 Jul 2024 00:52:04 +0000 (00:52 +0000)]

fix the mul_mat_id ut issues (llama/8427)

* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <redacted>
---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Co-authored-by: Chen Xi <redacted>

commit | commitdiff | tree

Nicholai Tukanov [Thu, 11 Jul 2024 16:49:15 +0000 (11:49 -0500)]

ggml : add NVPL BLAS support (#8329) (llama/8425)

* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <redacted>

commit | commitdiff | tree

Daniel Bevenius [Thu, 11 Jul 2024 15:53:42 +0000 (17:53 +0200)]

cuda : suppress 'noreturn' warn in no_device_code (llama/8414)

* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
346 | }
| ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <redacted>
* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <redacted>
---------

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Johannes Gäßler [Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)]

CUDA: optimize and refactor MMQ (llama/8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

commit | commitdiff | tree

AidanBeltonS [Wed, 10 Jul 2024 15:10:49 +0000 (16:10 +0100)]

Use multi_ptr to clean up deprecated warnings (llama/8256)

commit | commitdiff | tree

Georgi Gerganov [Wed, 10 Jul 2024 12:23:29 +0000 (15:23 +0300)]

ggml : move sgemm sources to llamafile subfolder (llama/8394)

ggml-ci

commit | commitdiff | tree

Dibakar Gope [Wed, 10 Jul 2024 12:14:51 +0000 (07:14 -0500)]

ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780)

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

commit | commitdiff | tree

Alberto Cabrera Pérez [Tue, 9 Jul 2024 14:03:15 +0000 (15:03 +0100)]

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372)

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment

commit | commitdiff | tree

Alberto Cabrera Pérez [Mon, 8 Jul 2024 13:22:41 +0000 (14:22 +0100)]

sycl : fix powf call in device code (llama/8368)

commit | commitdiff | tree

Mahesh Madhav [Thu, 25 Jul 2024 07:54:08 +0000 (00:54 -0700)]

ggml : loop tiling optimizations for scalar path (#898)

Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.

commit | commitdiff | tree

Ivan Filipov [Mon, 22 Jul 2024 11:32:02 +0000 (14:32 +0300)]

ggml: add support for float16 input tensors in pooling operations (#895)

* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <redacted>

commit | commitdiff | tree

Brian [Mon, 22 Jul 2024 10:25:01 +0000 (20:25 +1000)]

gguf.md: naming convention synced to llama.cpp (#896)

It is now updated to this form

`<BaseName><SizeLabel><FineTune><Version><Encoding><Type><Shard>.gguf`

commit | commitdiff | tree

Brian [Sun, 21 Jul 2024 08:20:30 +0000 (18:20 +1000)]

gguf.md: kv store has new authorship metadata keys (#897)

commit | commitdiff | tree

Tony Wasserka [Sat, 20 Jul 2024 18:49:44 +0000 (20:49 +0200)]

vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (#893)

This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sat, 20 Jul 2024 13:38:56 +0000 (16:38 +0300)]

py : update pacakges + fix yolo warning

commit | commitdiff | tree

Borislav Stanimirov [Fri, 12 Jul 2024 14:24:20 +0000 (17:24 +0300)]

cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (#885)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 11:54:35 +0000 (14:54 +0300)]

sync : whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 11:09:09 +0000 (14:09 +0300)]

examples : fix compile warnings [no ci] (whisper/0)

commit | commitdiff | tree

Daniel Bevenius [Mon, 8 Jul 2024 10:03:42 +0000 (12:03 +0200)]

ggml : remove unnecessary UNUSED macro call (#880)

This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 09:23:45 +0000 (12:23 +0300)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 07:39:36 +0000 (10:39 +0300)]

tests : fix whitespace (llama/0)

commit | commitdiff | tree

Natsu [Fri, 5 Jul 2024 14:29:35 +0000 (22:29 +0800)]

cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281)

commit | commitdiff | tree

Ouadie EL FAROUKI [Fri, 5 Jul 2024 12:23:25 +0000 (13:23 +0100)]

Enabled more data types for oneMKL gemm_batch (llama/8236)

commit | commitdiff | tree

Johannes Gäßler [Fri, 5 Jul 2024 07:06:31 +0000 (09:06 +0200)]

CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)

commit | commitdiff | tree

Daniele [Fri, 5 Jul 2024 07:06:09 +0000 (07:06 +0000)]

CUDA: revert part of the RDNA1 optimizations (llama/8309)

The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s

commit | commitdiff | tree

Johannes Gäßler [Fri, 5 Jul 2024 07:05:34 +0000 (09:05 +0200)]

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311)

commit | commitdiff | tree

luoyu-intel [Fri, 5 Jul 2024 05:06:13 +0000 (05:06 +0000)]

Fix WARP_SIZE=16 bug of Intel GPU (llama/8266)

* fix group_norm ut

* split softmax

* fix softmax

* add concat support condition

* revert debug code

* move QK_WARP_SIZE to presets.hpp

commit | commitdiff | tree

Neo Zhang Jianyu [Fri, 5 Jul 2024 02:32:29 +0000 (10:32 +0800)]

rm get_work_group_size() by local cache for performance (llama/8286)

Co-authored-by: arthw <redacted>

commit | commitdiff | tree

AidanBeltonS [Thu, 4 Jul 2024 01:07:19 +0000 (02:07 +0100)]

Remove unneeded semicolons (llama/8280)

commit | commitdiff | tree

Daniele [Wed, 3 Jul 2024 23:02:58 +0000 (23:02 +0000)]

Define and optimize RDNA1 (llama/8085)

commit | commitdiff | tree

Judd [Wed, 3 Jul 2024 12:40:16 +0000 (20:40 +0800)]

fix typo (llama/8267)

Co-authored-by: Judd <redacted>

commit | commitdiff | tree

AidanBeltonS [Wed, 3 Jul 2024 01:55:34 +0000 (02:55 +0100)]

Dequant improvements rebase (llama/8255)

* Single load for half2

* Store scales in local mem

* Vec load quantized values

commit | commitdiff | tree

Clint Herron [Tue, 2 Jul 2024 16:18:10 +0000 (12:18 -0400)]

Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)

commit | commitdiff | tree

slaren [Tue, 2 Jul 2024 06:39:38 +0000 (08:39 +0200)]

cuda : update supports_op for matrix multiplication (llama/8245)

commit | commitdiff | tree

luoyu-intel [Tue, 2 Jul 2024 04:50:07 +0000 (04:50 +0000)]

Fix win build conflict of math library (llama/8230)

* fix win build conflict of math library

* fix the condition: !(win32 & SYCL)

* revert warp_size=16

commit | commitdiff | tree

luoyu-intel [Tue, 2 Jul 2024 02:16:00 +0000 (02:16 +0000)]

Fix the sub group size of Intel (llama/8106)

* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size

commit | commitdiff | tree

Johannes Gäßler [Mon, 1 Jul 2024 18:39:06 +0000 (20:39 +0200)]

CUDA: refactor and optimize IQ MMVQ (llama/8215)

* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix

commit | commitdiff | tree

zhentaoyu [Mon, 1 Jul 2024 11:39:06 +0000 (19:39 +0800)]

Update SYCL-Rope op and Refactor (llama/8157)

* align with rope.cu and move sycl-op to a single file

commit | commitdiff | tree

Johannes Gäßler [Thu, 27 Jun 2024 14:26:05 +0000 (16:26 +0200)]

CUDA: fix MMQ stream-k for --split-mode row (llama/8167)

commit | commitdiff | tree

slaren [Tue, 2 Jul 2024 17:11:52 +0000 (19:11 +0200)]

fix uses of GGML_USE_CUBLAS in tests and examples (#879)

* fix uses of GGML_USE_CUBLAS in tests and examples

* fix ci/run.sh

ggml-ci

commit | commitdiff | tree

John Balis [Tue, 2 Jul 2024 16:09:52 +0000 (11:09 -0500)]

feat: cuda implementation for `ggml_conv_transpose_1d` (#854)

* conv transpose 1d passing test for 1d input and kernel

* working for different input and output channel counts, added test for variable stride

* initial draft appears to work with stride other than 1

* working with all old and new conv1d tests

* added a test for large tensors

* removed use cuda hardcoding

* restored test-conv-transpose.c

* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail

* fixed accumulator bug

* added test to test-backend-ops

* fixed mistake

* addressed review

* fixed includes

* removed blank lines

* style and warning fixes

* return failure when test fails

* fix supports_op

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Yilong Guo [Sun, 30 Jun 2024 16:05:44 +0000 (09:05 -0700)]

sycl : add build instruction (#870)

commit | commitdiff | tree

John Balis [Sun, 30 Jun 2024 15:14:31 +0000 (10:14 -0500)]

update "Using cuBLAS" to use correct update cuda compile flag (#876)

It seems like the previous `-DGGML_CUBLAS=ON` compile flag was deprecated.

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 20:26:16 +0000 (23:26 +0300)]

sync : whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 17:11:38 +0000 (20:11 +0300)]

whisper : disable CUDA mel + fix FFMPEG

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 19:42:27 +0000 (22:42 +0300)]

sync : llama.cpp

commit | commitdiff | tree

slaren [Wed, 26 Jun 2024 19:34:14 +0000 (21:34 +0200)]

ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140)

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 16:40:53 +0000 (19:40 +0300)]

sync : llama.cpp, whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 16:33:53 +0000 (19:33 +0300)]

ggml : reorganize source code + improve CMake (#865)

* scripts : update sync [no ci]

* ggml : move headers one up [no ci]

* files : reorganize + update CMake

ggml-ci

* cmake : build normal ggml library

ggml-ci

* cmake : link math library to test + remove ci for code cov

ggml-ci

* files : move public headers to include

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 21 Jun 2024 07:25:14 +0000 (10:25 +0300)]

files : remove old (#0)

commit | commitdiff | tree

Georgi Gerganov [Tue, 18 Jun 2024 06:48:08 +0000 (09:48 +0300)]

sync : whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Tue, 18 Jun 2024 06:37:20 +0000 (09:37 +0300)]

whisper : use ggml_backend_sched (whisper/2239)

* whisper : use ggml_backend_sched (wip)

* use sched in whisper_allocr

* whisper : single backend in whisper_context

* whisper : remove whisper_state->backends_used

* whisper : remove whisper_context->backend

* whisper : reset scheduler after init

* whisper : fix external encoder (e.g. CoreML)

* whisper : cleanup

* whisper : handle null GPU buffer types + fix sycl

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 17:31:33 +0000 (20:31 +0300)]

sync : whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Tue, 11 Jun 2024 14:39:01 +0000 (17:39 +0300)]

cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)

* cuda : fix bounds check for src0 rows in MMVQ kernel

* Update ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <redacted>
---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Fri, 31 May 2024 08:37:29 +0000 (11:37 +0300)]

whisper : remove `speed_up` and `phase_vocoder*` functions (whisper/2198)

* whisper : fix cast warning

* whisper : remove phase_vocoder functions, ref #2195

* whisper : remove speed_up from whisper_full_params, closes #2195

commit | commitdiff | tree

William Tambellini [Tue, 21 May 2024 15:31:41 +0000 (08:31 -0700)]

examples : add support for decoding input with ffmpeg (Linux) (whisper/2133)

- search for ffmpeg libs/headers at cmake time
- added ffmpeg-transcode.cpp into libcommon if ffmpeg on
- hooked ffmpeg trancoding in common read_wav(...)
- passed test:
./main -m ggml-base.en.bin -f samples/jfk.mp3

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 16:10:54 +0000 (19:10 +0300)]

examples : remove whisper (#860)

ggml-ci

commit | commitdiff | tree

slaren [Sun, 16 Jun 2024 10:57:37 +0000 (13:57 +0300)]

move BLAS to a separate backend (cont) (llama/6210)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 10:56:06 +0000 (13:56 +0300)]

scripts : sync ggml-blas

commit | commitdiff | tree

0cc4m [Sun, 16 Jun 2024 05:17:31 +0000 (07:17 +0200)]

Vulkan Shader Refactor, Memory Debugging Option (llama/7947)

* Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory

* Improve debug log code

* Add memory debug output option

* Fix flake8

* Fix unnecessary high llama-3 VRAM use

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 10:42:57 +0000 (13:42 +0300)]

ggml : remove OpenCL (#0)

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 10:05:11 +0000 (13:05 +0300)]

cmake : fix cuda vars (#0)

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 09:40:38 +0000 (12:40 +0300)]

scripts : update sync

commit | commitdiff | tree

Hong Bo PENG [Sun, 16 Jun 2024 08:53:11 +0000 (16:53 +0800)]

ggml : fix and optimize ppc64le (#849)

* fix compile issues introduced by loongarch_asx

* restore quant changes to merge

* fix compile issues introduced by loongarch_asx

* further optimize by using vec_msum & vec_sum4s on ppc64le

commit | commitdiff | tree

Daniel Bevenius [Sun, 16 Jun 2024 08:51:18 +0000 (10:51 +0200)]

ggml : remove duplicate include of ggml-common.h (#853)

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Yilong Guo [Sun, 16 Jun 2024 07:51:38 +0000 (00:51 -0700)]

sycl : remove global variables (cont) (llama/7710)

* separate DPCT helpers outside

* replace global variables with context

* remove useless extra

* update mul_mat condition

* remove duplicate buft initialization

* remove duplicate extra and global work group size

* remove useless backend check

* remove duplicated extras

* use macro for group_size and remove cuda-related

Co-authored-by: Meng, Hengyu <redacted>

commit | commitdiff | tree

Yilong Guo [Sun, 16 Jun 2024 07:40:35 +0000 (00:40 -0700)]

scripts : add ggml-sycl to sync scripts (#857)

commit | commitdiff | tree

Georgi Gerganov [Sat, 15 Jun 2024 18:12:18 +0000 (21:12 +0300)]

ci : add GG_BUILD_NO_DOWNLOAD

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 15 Jun 2024 17:54:22 +0000 (20:54 +0300)]

ggml : remove opencl (#0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 15 Jun 2024 17:53:02 +0000 (20:53 +0300)]

cuda : update build (#0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 15 Jun 2024 17:16:55 +0000 (20:16 +0300)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 15 Jun 2024 17:16:42 +0000 (20:16 +0300)]

tests : adapt to changes (#0)

commit | commitdiff | tree

Meng, Hengyu [Sat, 15 Jun 2024 06:05:10 +0000 (14:05 +0800)]

remove global variables (llama/7710)

* separate DPCT helpers outside

* replace global variables with context

* remove useless extra

* update mul_mat condition

* remove duplicate buft initialization

* remove duplicate extra and global work group size

* remove useless backend check

* remove duplicated extras

* use macro for group_size and remove cuda-related

commit | commitdiff | tree

Johannes Gäßler [Fri, 14 Jun 2024 16:41:49 +0000 (18:41 +0200)]

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)

* CUDA: faster q2_K, q3_K MMQ + int8 tensor cores

* try CI fix

* try CI fix

* try CI fix

* fix data race

* rever q2_K precision related changes

commit | commitdiff | tree

Georgi Gerganov [Fri, 14 Jun 2024 14:14:09 +0000 (17:14 +0300)]

metal : utilize max shared memory for mul_mat_id (llama/7935)

commit | commitdiff | tree

Radoslav Gerganov [Thu, 13 Jun 2024 12:18:44 +0000 (15:18 +0300)]

rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)

commit | commitdiff | tree

slaren [Thu, 13 Jun 2024 01:11:35 +0000 (03:11 +0200)]

move BLAS to a separate backend (llama/6210)

* move BLAS to a separate backend

* rename GGML_USE_OPENBLAS to GGML_USE_BLAS

* alloc : reuse same buffer when the same buffer type if used multiple times

* set number of threads automatically for openblas and blis

* sched : print assignments when GGML_SCHED_DEBUG env variable is set

* sched : allow ops with weights on an incompatible buffer type

This will cause the weight to be copied to a backend that supports the
op, which is very costly. The weight should have been stored in a buffer
of a backend that can run the op, but llama.cpp cannot do this
automatically at the moment.

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Johannes Gäßler [Wed, 12 Jun 2024 15:41:51 +0000 (17:41 +0200)]

CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)

commit | commitdiff | tree

Georgi Gerganov [Wed, 12 Jun 2024 13:00:22 +0000 (16:00 +0300)]

tests : add non-cont unary tests (llama/7857)

* tests : add non-cont unary tests

* ggml : update unary asserts and "supports_op"

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 12 Jun 2024 12:24:20 +0000 (15:24 +0300)]

ggml : improve ggml_is_contiguous logic (llama/7856)

* ggml : improve ggml_is_contiguous logic

ggml-ci

* ggml : support more contiguous cases

ggml-ci

commit | commitdiff | tree

k.h.lai [Tue, 11 Jun 2024 19:26:05 +0000 (03:26 +0800)]

vulkan: select only one device for single gpu with multiple drivers (llama/7582)

commit | commitdiff | tree

0cc4m [Tue, 11 Jun 2024 19:20:29 +0000 (21:20 +0200)]

Update Vulkan RoPE implementation (llama/7818)

* Update Vulkan RoPE implementation

* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception

Minor fixes

* Fix segfault when running out of VRAM

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 11 Jun 2024 06:26:07 +0000 (08:26 +0200)]

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (llama/7860)

commit | commitdiff | tree

Johannes Gäßler [Mon, 10 Jun 2024 09:45:13 +0000 (11:45 +0200)]

CUDA: use tensor cores for MMQ (llama/7676)

* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early

commit | commitdiff | tree

Ben Ashbaugh [Mon, 10 Jun 2024 09:21:31 +0000 (02:21 -0700)]

use the correct SYCL context for host USM allocations (llama/7777)

Signed-off-by: Ben Ashbaugh <redacted>

commit | commitdiff | tree

Johannes Gäßler [Sun, 9 Jun 2024 07:42:25 +0000 (09:42 +0200)]

CUDA: revise q8_1 data layout for mul_mat_q (llama/7824)

Packaging of ggml-org/ggml

RSS Atom