]> git.djapps.eu Git - pkg/ggml/sources/ggml/log
pkg/ggml/sources/ggml
11 months agocommon : handle new quant types (#0)
Georgi Gerganov [Sat, 27 Jul 2024 14:17:04 +0000 (17:17 +0300)]
common : handle new quant types (#0)

11 months agoggml : add ggml-aarch64 (#0)
Dibakar Gope [Sat, 27 Jul 2024 14:16:40 +0000 (17:16 +0300)]
ggml : add ggml-aarch64 (#0)

11 months agocann: Fix Multi-NPU execution error (llama/8710)
wangshuai09 [Sat, 27 Jul 2024 08:36:44 +0000 (16:36 +0800)]
cann: Fix Multi-NPU execution error (llama/8710)

* cann: fix multi-npu exec error

* cann: update comment  for ggml_backend_cann_supports_buft

11 months agoggml : reduce hash table reset cost (llama/8698)
slaren [Sat, 27 Jul 2024 02:41:55 +0000 (04:41 +0200)]
ggml : reduce hash table reset cost (llama/8698)

* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string

11 months agoggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)
DavidKorczynski [Thu, 25 Jul 2024 21:23:05 +0000 (22:23 +0100)]
ggml: handle ggml_init failure to fix NULL pointer deref (llama/8692)

`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.

This fixes it by bailing out if no context is found.

11 months agoggml : fix build on Windows with Snapdragon X (llama/8531)
Andreas (Andi) Kunar [Thu, 25 Jul 2024 16:01:00 +0000 (18:01 +0200)]
ggml : fix build on Windows with Snapdragon X (llama/8531)

* Improvements for Windows with Snapdragon X

* Revert "Improvements for Windows with Snapdragon X"

This reverts commit bf21397ae5ea7c73d3494db3b91505599909227d.

* Improvements for Windows with Snapdragon X

* WOA build clarifications

* WIndows on ARM build clarifications

* cmake build for Windows clarifications

* Update docs/build.md

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <redacted>
11 months agofix multi-gpu issue on sycl (llama/8554)
Chen Xi [Thu, 25 Jul 2024 11:45:18 +0000 (11:45 +0000)]
fix multi-gpu issue on sycl (llama/8554)

---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
11 months agoggml : add and use ggml_cpu_has_llamafile() (llama/8664)
Georgi Gerganov [Thu, 25 Jul 2024 09:37:42 +0000 (12:37 +0300)]
ggml : add and use ggml_cpu_has_llamafile() (llama/8664)

11 months agoRe-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)
Joe Todd [Wed, 24 Jul 2024 10:55:26 +0000 (11:55 +0100)]
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (llama/8667)

11 months agosycl : Add support for non-release DPC++ & oneMKL (llama/8644)
Joe Todd [Tue, 23 Jul 2024 13:58:37 +0000 (14:58 +0100)]
sycl : Add support for non-release DPC++ & oneMKL (llama/8644)

* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <redacted>
11 months agoVulkan IQ4_NL Support (llama/8613)
0cc4m [Tue, 23 Jul 2024 08:56:49 +0000 (10:56 +0200)]
Vulkan IQ4_NL Support (llama/8613)

* Fix Vulkan matmul tests compile errors

* Add Vulkan IQ4_NL support

* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support

11 months agoAllow all RDNA2 archs to use sdot4 intrinsic (llama/8629)
Jeroen Mostert [Tue, 23 Jul 2024 08:50:40 +0000 (10:50 +0200)]
Allow all RDNA2 archs to use sdot4 intrinsic (llama/8629)

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

11 months agofix scratch size of softmax (llama/8642)
luoyu-intel [Tue, 23 Jul 2024 07:43:28 +0000 (07:43 +0000)]
fix scratch size of softmax (llama/8642)

11 months agoggml: fix compile error for RISC-V (llama/8623)
Mark Zhuang [Mon, 22 Jul 2024 07:56:45 +0000 (15:56 +0800)]
ggml: fix compile error for RISC-V (llama/8623)

11 months agoCUDA: MMQ code deduplication + iquant support (llama/8495)
Johannes Gäßler [Sat, 20 Jul 2024 20:25:26 +0000 (22:25 +0200)]
CUDA: MMQ code deduplication + iquant support (llama/8495)

* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build

11 months agogguf : handle null name during init (llama/8587)
Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]
gguf : handle null name during init (llama/8587)

11 months agoggml : fix quant dot product with odd number of blocks (llama/8549)
slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]
ggml : fix quant dot product with odd number of blocks (llama/8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (llama/8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
11 months agoggml : add friendlier error message to fopen errors (llama/8575)
Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]
ggml : add friendlier error message to fopen errors (llama/8575)

* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.

11 months agoCUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)
Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]
CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)

11 months agocmake : install all ggml public headers (llama/8480)
65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]
cmake : install all ggml public headers (llama/8480)

Co-authored-by: 65a <redacted>
11 months agoAdd Ascend NPU backend (llama/6035)
hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]
Add Ascend NPU backend (llama/6035)

* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <redacted>
11 months agomake/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)
Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]
make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)

11 months agoRefactor lora adapter support (llama/8332)
Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]
Refactor lora adapter support (llama/8332)

* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>
11 months agoggml : suppress unknown pragma 'GCC' on windows (llama/8460)
Daniel Bevenius [Mon, 15 Jul 2024 12:48:17 +0000 (14:48 +0200)]
ggml : suppress unknown pragma 'GCC' on windows (llama/8460)

This commit adds a macro guard to pragma GCC to avoid the following
warning on windows:

```console
C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068:
unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj]
```

11 months agoadd concat through dim 1/2 (llama/8483)
Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]
add concat through dim 1/2 (llama/8483)

* add concat through dim 1/2

11 months agoVulkan MMQ Fix (llama/8479)
0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]
Vulkan MMQ Fix (llama/8479)

* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error

11 months agovulkan : cmake integration (llama/8119)
bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]
vulkan : cmake integration (llama/8119)

* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg

11 months agometal : template-ify some of the kernels (llama/8447)
Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]
metal : template-ify some of the kernels (llama/8447)

ggml-ci

11 months agoggml : minor naming changes (llama/8433)
Georgi Gerganov [Fri, 12 Jul 2024 07:46:02 +0000 (10:46 +0300)]
ggml : minor naming changes (llama/8433)

* ggml : minor naming changes

ggml-ci

* ggml : use PRId64 [no ci]

* ggml : revert FA K/Q names

11 months agofix the mul_mat_id ut issues (llama/8427)
Chen Xi [Fri, 12 Jul 2024 00:52:04 +0000 (00:52 +0000)]
fix the mul_mat_id ut issues (llama/8427)

* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <redacted>
---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Co-authored-by: Chen Xi <redacted>
11 months agoggml : add NVPL BLAS support (#8329) (llama/8425)
Nicholai Tukanov [Thu, 11 Jul 2024 16:49:15 +0000 (11:49 -0500)]
ggml : add NVPL BLAS support (#8329) (llama/8425)

* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <redacted>
11 months agocuda : suppress 'noreturn' warn in no_device_code (llama/8414)
Daniel Bevenius [Thu, 11 Jul 2024 15:53:42 +0000 (17:53 +0200)]
cuda : suppress 'noreturn' warn in no_device_code (llama/8414)

* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
  346 | }
      | ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <redacted>
* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <redacted>
---------

Signed-off-by: Daniel Bevenius <redacted>
11 months agoCUDA: optimize and refactor MMQ (llama/8416)
Johannes Gäßler [Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)]
CUDA: optimize and refactor MMQ (llama/8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

11 months agoUse multi_ptr to clean up deprecated warnings (llama/8256)
AidanBeltonS [Wed, 10 Jul 2024 15:10:49 +0000 (16:10 +0100)]
Use multi_ptr to clean up deprecated warnings (llama/8256)

11 months agoggml : move sgemm sources to llamafile subfolder (llama/8394)
Georgi Gerganov [Wed, 10 Jul 2024 12:23:29 +0000 (15:23 +0300)]
ggml : move sgemm sources to llamafile subfolder (llama/8394)

ggml-ci

11 months agoggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780)
Dibakar Gope [Wed, 10 Jul 2024 12:14:51 +0000 (07:14 -0500)]
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780)

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

11 months agosycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372)
Alberto Cabrera Pérez [Tue, 9 Jul 2024 14:03:15 +0000 (15:03 +0100)]
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372)

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment

11 months agosycl : fix powf call in device code (llama/8368)
Alberto Cabrera Pérez [Mon, 8 Jul 2024 13:22:41 +0000 (14:22 +0100)]
sycl : fix powf call in device code (llama/8368)

11 months agoggml : loop tiling optimizations for scalar path (#898)
Mahesh Madhav [Thu, 25 Jul 2024 07:54:08 +0000 (00:54 -0700)]
ggml : loop tiling optimizations for scalar path (#898)

Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.

11 months agoggml: add support for float16 input tensors in pooling operations (#895)
Ivan Filipov [Mon, 22 Jul 2024 11:32:02 +0000 (14:32 +0300)]
ggml: add support for float16 input tensors in pooling operations (#895)

* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <redacted>
11 months agogguf.md: naming convention synced to llama.cpp (#896)
Brian [Mon, 22 Jul 2024 10:25:01 +0000 (20:25 +1000)]
gguf.md: naming convention synced to llama.cpp (#896)

It is now updated to this form

`<BaseName><SizeLabel><FineTune><Version><Encoding><Type><Shard>.gguf`

11 months agogguf.md: kv store has new authorship metadata keys (#897)
Brian [Sun, 21 Jul 2024 08:20:30 +0000 (18:20 +1000)]
gguf.md: kv store has new authorship metadata keys (#897)

11 months agovulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (#893)
Tony Wasserka [Sat, 20 Jul 2024 18:49:44 +0000 (20:49 +0200)]
vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (#893)

This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <redacted>
11 months agopy : update pacakges + fix yolo warning
Georgi Gerganov [Sat, 20 Jul 2024 13:38:56 +0000 (16:38 +0300)]
py : update pacakges + fix yolo warning

11 months agocmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (#885)
Borislav Stanimirov [Fri, 12 Jul 2024 14:24:20 +0000 (17:24 +0300)]
cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (#885)

11 months agosync : whisper.cpp
Georgi Gerganov [Mon, 8 Jul 2024 11:54:35 +0000 (14:54 +0300)]
sync : whisper.cpp

11 months agoexamples : fix compile warnings [no ci] (whisper/0)
Georgi Gerganov [Mon, 8 Jul 2024 11:09:09 +0000 (14:09 +0300)]
examples : fix compile warnings [no ci] (whisper/0)

11 months agoggml : remove unnecessary UNUSED macro call (#880)
Daniel Bevenius [Mon, 8 Jul 2024 10:03:42 +0000 (12:03 +0200)]
ggml : remove unnecessary UNUSED macro call (#880)

This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.

Signed-off-by: Daniel Bevenius <redacted>
11 months agosync : llama.cpp
Georgi Gerganov [Mon, 8 Jul 2024 09:23:45 +0000 (12:23 +0300)]
sync : llama.cpp

ggml-ci

11 months agotests : fix whitespace (llama/0)
Georgi Gerganov [Mon, 8 Jul 2024 07:39:36 +0000 (10:39 +0300)]
tests : fix whitespace (llama/0)

11 months agocmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281)
Natsu [Fri, 5 Jul 2024 14:29:35 +0000 (22:29 +0800)]
cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281)

11 months agoEnabled more data types for oneMKL gemm_batch (llama/8236)
Ouadie EL FAROUKI [Fri, 5 Jul 2024 12:23:25 +0000 (13:23 +0100)]
Enabled more data types for oneMKL gemm_batch (llama/8236)

11 months agoCUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)
Johannes Gäßler [Fri, 5 Jul 2024 07:06:31 +0000 (09:06 +0200)]
CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)

11 months agoCUDA: revert part of the RDNA1 optimizations (llama/8309)
Daniele [Fri, 5 Jul 2024 07:06:09 +0000 (07:06 +0000)]
CUDA: revert part of the RDNA1 optimizations (llama/8309)

The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s

11 months agoCUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311)
Johannes Gäßler [Fri, 5 Jul 2024 07:05:34 +0000 (09:05 +0200)]
CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311)

11 months agoFix WARP_SIZE=16 bug of Intel GPU (llama/8266)
luoyu-intel [Fri, 5 Jul 2024 05:06:13 +0000 (05:06 +0000)]
Fix WARP_SIZE=16 bug of Intel GPU (llama/8266)

* fix group_norm ut

* split softmax

* fix softmax

* add concat support condition

* revert debug code

* move QK_WARP_SIZE to presets.hpp

11 months agorm get_work_group_size() by local cache for performance (llama/8286)
Neo Zhang Jianyu [Fri, 5 Jul 2024 02:32:29 +0000 (10:32 +0800)]
rm get_work_group_size() by local cache for performance (llama/8286)

Co-authored-by: arthw <redacted>
11 months agoRemove unneeded semicolons (llama/8280)
AidanBeltonS [Thu, 4 Jul 2024 01:07:19 +0000 (02:07 +0100)]
Remove unneeded semicolons (llama/8280)

11 months agoDefine and optimize RDNA1 (llama/8085)
Daniele [Wed, 3 Jul 2024 23:02:58 +0000 (23:02 +0000)]
Define and optimize RDNA1 (llama/8085)

11 months agofix typo (llama/8267)
Judd [Wed, 3 Jul 2024 12:40:16 +0000 (20:40 +0800)]
fix typo (llama/8267)

Co-authored-by: Judd <redacted>
11 months agoDequant improvements rebase (llama/8255)
AidanBeltonS [Wed, 3 Jul 2024 01:55:34 +0000 (02:55 +0100)]
Dequant improvements rebase (llama/8255)

* Single load for half2

* Store scales in local mem

* Vec load quantized values

11 months agoRemoves multiple newlines at the end of files that is breaking the editorconfig step...
Clint Herron [Tue, 2 Jul 2024 16:18:10 +0000 (12:18 -0400)]
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)

11 months agocuda : update supports_op for matrix multiplication (llama/8245)
slaren [Tue, 2 Jul 2024 06:39:38 +0000 (08:39 +0200)]
cuda : update supports_op for matrix multiplication (llama/8245)

11 months agoFix win build conflict of math library (llama/8230)
luoyu-intel [Tue, 2 Jul 2024 04:50:07 +0000 (04:50 +0000)]
Fix win build conflict of math library (llama/8230)

* fix win build conflict of math library

* fix the condition: !(win32 & SYCL)

* revert warp_size=16

11 months agoFix the sub group size of Intel (llama/8106)
luoyu-intel [Tue, 2 Jul 2024 02:16:00 +0000 (02:16 +0000)]
Fix the sub group size of Intel (llama/8106)

* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size

11 months agoCUDA: refactor and optimize IQ MMVQ (llama/8215)
Johannes Gäßler [Mon, 1 Jul 2024 18:39:06 +0000 (20:39 +0200)]
CUDA: refactor and optimize IQ MMVQ (llama/8215)

* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix

11 months agoUpdate SYCL-Rope op and Refactor (llama/8157)
zhentaoyu [Mon, 1 Jul 2024 11:39:06 +0000 (19:39 +0800)]
Update SYCL-Rope op and Refactor (llama/8157)

* align with rope.cu and move sycl-op to a single file

11 months agoCUDA: fix MMQ stream-k for --split-mode row (llama/8167)
Johannes Gäßler [Thu, 27 Jun 2024 14:26:05 +0000 (16:26 +0200)]
CUDA: fix MMQ stream-k for --split-mode row (llama/8167)

11 months agofix uses of GGML_USE_CUBLAS in tests and examples (#879)
slaren [Tue, 2 Jul 2024 17:11:52 +0000 (19:11 +0200)]
fix uses of GGML_USE_CUBLAS in tests and examples (#879)

* fix uses of GGML_USE_CUBLAS in tests and examples

* fix ci/run.sh

ggml-ci

11 months agofeat: cuda implementation for `ggml_conv_transpose_1d` (#854)
John Balis [Tue, 2 Jul 2024 16:09:52 +0000 (11:09 -0500)]
feat: cuda implementation for `ggml_conv_transpose_1d` (#854)

* conv transpose 1d passing test for 1d input and kernel

* working for different input and output channel counts, added test for variable stride

* initial draft appears to work with stride other than 1

* working with all old and new conv1d  tests

* added a test for large tensors

* removed use cuda hardcoding

* restored test-conv-transpose.c

* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail

* fixed accumulator bug

* added test to test-backend-ops

* fixed mistake

* addressed review

* fixed includes

* removed blank lines

* style and warning fixes

* return failure when test fails

* fix supports_op

---------

Co-authored-by: slaren <redacted>
11 months agosycl : add build instruction (#870)
Yilong Guo [Sun, 30 Jun 2024 16:05:44 +0000 (09:05 -0700)]
sycl : add build instruction (#870)

11 months agoupdate "Using cuBLAS" to use correct update cuda compile flag (#876)
John Balis [Sun, 30 Jun 2024 15:14:31 +0000 (10:14 -0500)]
update "Using cuBLAS" to use correct update cuda compile flag (#876)

It seems like the previous `-DGGML_CUBLAS=ON` compile flag was deprecated.

12 months agosync : whisper.cpp
Georgi Gerganov [Wed, 26 Jun 2024 20:26:16 +0000 (23:26 +0300)]
sync : whisper.cpp

12 months agowhisper : disable CUDA mel + fix FFMPEG
Georgi Gerganov [Wed, 26 Jun 2024 17:11:38 +0000 (20:11 +0300)]
whisper : disable CUDA mel + fix FFMPEG

12 months agosync : llama.cpp
Georgi Gerganov [Wed, 26 Jun 2024 19:42:27 +0000 (22:42 +0300)]
sync : llama.cpp

12 months agoggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama...
slaren [Wed, 26 Jun 2024 19:34:14 +0000 (21:34 +0200)]
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140)

12 months agosync : llama.cpp, whisper.cpp
Georgi Gerganov [Wed, 26 Jun 2024 16:40:53 +0000 (19:40 +0300)]
sync : llama.cpp, whisper.cpp

12 months agoggml : reorganize source code + improve CMake (#865)
Georgi Gerganov [Wed, 26 Jun 2024 16:33:53 +0000 (19:33 +0300)]
ggml : reorganize source code + improve CMake (#865)

* scripts : update sync [no ci]

* ggml : move headers one up [no ci]

* files : reorganize + update CMake

ggml-ci

* cmake : build normal ggml library

ggml-ci

* cmake : link math library to test + remove ci for code cov

ggml-ci

* files : move public headers to include

ggml-ci

12 months agofiles : remove old (#0)
Georgi Gerganov [Fri, 21 Jun 2024 07:25:14 +0000 (10:25 +0300)]
files : remove old (#0)

12 months agosync : whisper.cpp
Georgi Gerganov [Tue, 18 Jun 2024 06:48:08 +0000 (09:48 +0300)]
sync : whisper.cpp

12 months agowhisper : use ggml_backend_sched (whisper/2239)
Georgi Gerganov [Tue, 18 Jun 2024 06:37:20 +0000 (09:37 +0300)]
whisper : use ggml_backend_sched (whisper/2239)

* whisper : use ggml_backend_sched (wip)

* use sched in whisper_allocr

* whisper : single backend in whisper_context

* whisper : remove whisper_state->backends_used

* whisper : remove whisper_context->backend

* whisper : reset scheduler after init

* whisper : fix external encoder (e.g. CoreML)

* whisper : cleanup

* whisper : handle null GPU buffer types + fix sycl

---------

Co-authored-by: slaren <redacted>
12 months agosync : whisper.cpp
Georgi Gerganov [Sun, 16 Jun 2024 17:31:33 +0000 (20:31 +0300)]
sync : whisper.cpp

12 months agocuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)
Georgi Gerganov [Tue, 11 Jun 2024 14:39:01 +0000 (17:39 +0300)]
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)

* cuda : fix bounds check for src0 rows in MMVQ kernel

* Update ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <redacted>
---------

Co-authored-by: Johannes Gäßler <redacted>
12 months agowhisper : remove `speed_up` and `phase_vocoder*` functions (whisper/2198)
Borislav Stanimirov [Fri, 31 May 2024 08:37:29 +0000 (11:37 +0300)]
whisper : remove `speed_up` and `phase_vocoder*` functions (whisper/2198)

* whisper : fix cast warning

* whisper : remove phase_vocoder functions, ref #2195

* whisper : remove speed_up from whisper_full_params, closes #2195

12 months agoexamples : add support for decoding input with ffmpeg (Linux) (whisper/2133)
William Tambellini [Tue, 21 May 2024 15:31:41 +0000 (08:31 -0700)]
examples : add support for decoding input with ffmpeg (Linux) (whisper/2133)

- search for ffmpeg libs/headers at cmake time
- added ffmpeg-transcode.cpp into libcommon if ffmpeg on
- hooked ffmpeg trancoding in common read_wav(...)
- passed test:
./main -m ggml-base.en.bin -f samples/jfk.mp3

12 months agoexamples : remove whisper (#860)
Georgi Gerganov [Sun, 16 Jun 2024 16:10:54 +0000 (19:10 +0300)]
examples : remove whisper (#860)

ggml-ci

12 months agomove BLAS to a separate backend (cont) (llama/6210)
slaren [Sun, 16 Jun 2024 10:57:37 +0000 (13:57 +0300)]
move BLAS to a separate backend (cont) (llama/6210)

ggml-ci

12 months agoscripts : sync ggml-blas
Georgi Gerganov [Sun, 16 Jun 2024 10:56:06 +0000 (13:56 +0300)]
scripts : sync ggml-blas

12 months agoVulkan Shader Refactor, Memory Debugging Option (llama/7947)
0cc4m [Sun, 16 Jun 2024 05:17:31 +0000 (07:17 +0200)]
Vulkan Shader Refactor, Memory Debugging Option (llama/7947)

* Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory

* Improve debug log code

* Add memory debug output option

* Fix flake8

* Fix unnecessary high llama-3 VRAM use

12 months agoggml : remove OpenCL (#0)
Georgi Gerganov [Sun, 16 Jun 2024 10:42:57 +0000 (13:42 +0300)]
ggml : remove OpenCL (#0)

12 months agocmake : fix cuda vars (#0)
Georgi Gerganov [Sun, 16 Jun 2024 10:05:11 +0000 (13:05 +0300)]
cmake : fix cuda vars (#0)

12 months agoscripts : update sync
Georgi Gerganov [Sun, 16 Jun 2024 09:40:38 +0000 (12:40 +0300)]
scripts : update sync

12 months agoggml : fix and optimize ppc64le (#849)
Hong Bo PENG [Sun, 16 Jun 2024 08:53:11 +0000 (16:53 +0800)]
ggml : fix and optimize ppc64le (#849)

* fix compile issues introduced by loongarch_asx

* restore quant changes to merge

* fix compile issues introduced by loongarch_asx

* further optimize by using vec_msum & vec_sum4s on ppc64le

12 months agoggml : remove duplicate include of ggml-common.h (#853)
Daniel Bevenius [Sun, 16 Jun 2024 08:51:18 +0000 (10:51 +0200)]
ggml : remove duplicate include of ggml-common.h (#853)

Signed-off-by: Daniel Bevenius <redacted>
12 months agosycl : remove global variables (cont) (llama/7710)
Yilong Guo [Sun, 16 Jun 2024 07:51:38 +0000 (00:51 -0700)]
sycl : remove global variables (cont) (llama/7710)

* separate DPCT helpers outside

* replace global variables with context

* remove useless extra

* update mul_mat condition

* remove duplicate buft initialization

* remove duplicate extra and global work group size

* remove useless backend check

* remove duplicated extras

* use macro for group_size and remove cuda-related

Co-authored-by: Meng, Hengyu <redacted>
12 months agoscripts : add ggml-sycl to sync scripts (#857)
Yilong Guo [Sun, 16 Jun 2024 07:40:35 +0000 (00:40 -0700)]
scripts : add ggml-sycl to sync scripts (#857)

12 months agoci : add GG_BUILD_NO_DOWNLOAD
Georgi Gerganov [Sat, 15 Jun 2024 18:12:18 +0000 (21:12 +0300)]
ci : add GG_BUILD_NO_DOWNLOAD

ggml-ci

12 months agoggml : remove opencl (#0)
Georgi Gerganov [Sat, 15 Jun 2024 17:54:22 +0000 (20:54 +0300)]
ggml : remove opencl (#0)

ggml-ci

12 months agocuda : update build (#0)
Georgi Gerganov [Sat, 15 Jun 2024 17:53:02 +0000 (20:53 +0300)]
cuda : update build (#0)

ggml-ci

12 months agosync : llama.cpp
Georgi Gerganov [Sat, 15 Jun 2024 17:16:55 +0000 (20:16 +0300)]
sync : llama.cpp

ggml-ci