git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

overview / pkg / ggml / sources / whisper.cpp / log

Johannes Gäßler [Sat, 20 Jul 2024 20:25:26 +0000 (22:25 +0200)]

CUDA: MMQ code deduplication + iquant support (llama/8495)

* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build

commit | commitdiff | tree

Georgi Gerganov [Sat, 20 Jul 2024 14:15:42 +0000 (17:15 +0300)]

gguf : handle null name during init (llama/8587)

commit | commitdiff | tree

slaren [Fri, 19 Jul 2024 15:17:27 +0000 (17:17 +0200)]

ggml : fix quant dot product with odd number of blocks (llama/8549)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix odd blocks for ARM_NEON (llama/8556)

* ggml : fix iq4_nl dot product with odd number of blocks

* ggml : fix q4_1

* ggml : fix q5_0

* ggml : fix q5_1

* ggml : fix iq4_nl metal

ggml-ci

* ggml : fix q4_0

* ggml : fix q8_0

ggml-ci

* ggml : remove special Q4_0 code for first 2 blocks

* ggml : fix sumf redefinition

---------

Co-authored-by: slaren <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Clint Herron [Fri, 19 Jul 2024 11:05:45 +0000 (07:05 -0400)]

ggml : add friendlier error message to fopen errors (llama/8575)

* Add additional error information when model files fail to load.

* Adding additional error information to most instances of fopen.

commit | commitdiff | tree

Johannes Gäßler [Thu, 18 Jul 2024 21:48:47 +0000 (23:48 +0200)]

CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572)

commit | commitdiff | tree

65a [Thu, 18 Jul 2024 14:47:12 +0000 (07:47 -0700)]

cmake : install all ggml public headers (llama/8480)

Co-authored-by: 65a <redacted>

commit | commitdiff | tree

hipudding [Wed, 17 Jul 2024 11:23:50 +0000 (19:23 +0800)]

Add Ascend NPU backend (llama/6035)

* [CANN] Add Ascend NPU backend

Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.

CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.

Co-authored-by: wangshuai09 <redacted>
* delete trailing whitespaces

* Modify the code based on review comment

* Rename LLAMA_CANN to GGML_CANN

* Make ggml-common.h private

* add ggml_cann prefix for acl funcs

* Add logging for CANN backend

* Delete Trailing whitespace

---------

Co-authored-by: wangshuai09 <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 16 Jul 2024 19:20:59 +0000 (21:20 +0200)]

make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515)

commit | commitdiff | tree

Xuan Son Nguyen [Mon, 15 Jul 2024 18:50:47 +0000 (20:50 +0200)]

Refactor lora adapter support (llama/8332)

* lora: load to devide buft

* add patch tensor function

* correct tensor patch

* llama_lora_adapter_apply

* correct ggml_backend_tensor_copy

* add llm_build_mm

* fix auto merge

* update based on review comments

* add convert script

* no more transpose A

* add f16 convert

* add metadata check

* add sanity check

* fix ftype

* add requirements

* fix requirements

* fix outfile

* conversion: only allow selected models

* fix types

* cuda : do not use dmmv if the tensor does not have enough cols

* llama : lora fixes

* do not disable mmap with lora

Co-authored-by: slaren <redacted>
* llm_build_lora_mm_id

* convert_lora : MoE LoRA conversion support

* convert_lora : prefer safetensors, similarly to convert_hf

* convert_hf : simplify modify_tensors for InternLM2

* convert_lora : lazy conversion

* llama : load and use alpha from LoRA adapters

* llama : use llm_build_lora_mm in most model graphs

* auto scale

* Revert "auto scale"

This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464.

* remove redundant params

* Apply suggestions from code review

Co-authored-by: slaren <redacted>
* change kv metadata

* move add_type to __init__

* convert_hf : move add_type to main()

* convert_lora : use the GGUFWriter from Model instead of overwriting it

---------

Co-authored-by: slaren <redacted>
Co-authored-by: Francis Couture-Harpin <redacted>

commit | commitdiff | tree

Meng, Hengyu [Mon, 15 Jul 2024 11:32:15 +0000 (19:32 +0800)]

add concat through dim 1/2 (llama/8483)

* add concat through dim 1/2

commit | commitdiff | tree

0cc4m [Mon, 15 Jul 2024 07:38:52 +0000 (09:38 +0200)]

Vulkan MMQ Fix (llama/8479)

* Fix incoherence by adding missing LOAD_VEC_A parameter

* Fix Vulkan op result checker build error

commit | commitdiff | tree

bandoti [Sat, 13 Jul 2024 16:12:39 +0000 (13:12 -0300)]

vulkan : cmake integration (llama/8119)

* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jul 2024 15:32:33 +0000 (18:32 +0300)]

metal : template-ify some of the kernels (llama/8447)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jul 2024 07:46:02 +0000 (10:46 +0300)]

ggml : minor naming changes (llama/8433)

* ggml : minor naming changes

ggml-ci

* ggml : use PRId64 [no ci]

* ggml : revert FA K/Q names

commit | commitdiff | tree

Chen Xi [Fri, 12 Jul 2024 00:52:04 +0000 (00:52 +0000)]

fix the mul_mat_id ut issues (llama/8427)

* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <redacted>
---------

Signed-off-by: Chen Xi <redacted>
Co-authored-by: Meng, Hengyu <redacted>
Co-authored-by: Chen Xi <redacted>

commit | commitdiff | tree

Nicholai Tukanov [Thu, 11 Jul 2024 16:49:15 +0000 (11:49 -0500)]

ggml : add NVPL BLAS support (ggml/8329) (llama/8425)

* ggml : add NVPL BLAS support

* ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>`

---------

Co-authored-by: ntukanov <redacted>

commit | commitdiff | tree

Daniel Bevenius [Thu, 11 Jul 2024 15:53:42 +0000 (17:53 +0200)]

cuda : suppress 'noreturn' warn in no_device_code (llama/8414)

* cuda : suppress 'noreturn' warn in no_device_code

This commit adds a while(true) loop to the no_device_code function in
common.cuh. This is done to suppress the warning:

```console
/src/ggml-cuda/template-instances/../common.cuh:346:1: warning:
function declared 'noreturn' should not return [-Winvalid-noreturn]
346 | }
| ^
```

The motivation for this is to reduce the number of warnings when
compilng with GGML_HIPBLAS=ON.

Signed-off-by: Daniel Bevenius <redacted>
* squash! cuda : suppress 'noreturn' warn in no_device_code

Update __trap macro instead of using a while loop to suppress the
warning.

Signed-off-by: Daniel Bevenius <redacted>
---------

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Johannes Gäßler [Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)]

CUDA: optimize and refactor MMQ (llama/8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

commit | commitdiff | tree

AidanBeltonS [Wed, 10 Jul 2024 15:10:49 +0000 (16:10 +0100)]

Use multi_ptr to clean up deprecated warnings (llama/8256)

commit | commitdiff | tree

Georgi Gerganov [Wed, 10 Jul 2024 12:23:29 +0000 (15:23 +0300)]

ggml : move sgemm sources to llamafile subfolder (llama/8394)

ggml-ci

commit | commitdiff | tree

Dibakar Gope [Wed, 10 Jul 2024 12:14:51 +0000 (07:14 -0500)]

ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780)

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

commit | commitdiff | tree

Alberto Cabrera Pérez [Tue, 9 Jul 2024 14:03:15 +0000 (15:03 +0100)]

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372)

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment

commit | commitdiff | tree

Alberto Cabrera Pérez [Mon, 8 Jul 2024 13:22:41 +0000 (14:22 +0100)]

sycl : fix powf call in device code (llama/8368)

commit | commitdiff | tree

Mahesh Madhav [Thu, 25 Jul 2024 07:54:08 +0000 (00:54 -0700)]

ggml : loop tiling optimizations for scalar path (ggml/898)

Apply a loop tiling technique to the generic path, which provides
performance upside for ISAs with enough registers to take advantage
of it. Also helps the compiler optimize this path.

commit | commitdiff | tree

Ivan Filipov [Mon, 22 Jul 2024 11:32:02 +0000 (14:32 +0300)]

ggml: add support for float16 input tensors in pooling operations (ggml/895)

* Add support for float16 tensors in 1d pooling operations

* Add support for float16 input tensors in 2d pooling operations

* code cleanup

remove unnecessary casting during srow ptr initialization

---------

Co-authored-by: vanaka11 <redacted>

commit | commitdiff | tree

Tony Wasserka [Sat, 20 Jul 2024 18:49:44 +0000 (20:49 +0200)]

vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893)

This prevents invalid frees when destroying a partially initialized
vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer
when running out of device memory.

Co-authored-by: Tony Wasserka <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Fri, 12 Jul 2024 14:24:20 +0000 (17:24 +0300)]

cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885)

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 Aug 2024 11:00:51 +0000 (14:00 +0300)]

scripts : sync new files (#0)

commit | commitdiff | tree

Daven Sanassy [Mon, 5 Aug 2024 06:48:26 +0000 (07:48 +0100)]

cmake : fix compile in xcode (#2311)

commit | commitdiff | tree

Georgi Gerganov [Sat, 27 Jul 2024 17:35:04 +0000 (20:35 +0300)]

whisper : handle empty mel (#2324)

commit | commitdiff | tree

Matt Stephenson [Tue, 16 Jul 2024 07:21:09 +0000 (03:21 -0400)]

whisper : use vulkan as gpu backend when available (#2302)

* ggml: use vulkan as gpu backend when available

Signed-off-by: Matt Stephenson <redacted>
* whisper: enable using vk as default buffer type

Signed-off-by: Matt Stephenson <redacted>
---------

Signed-off-by: Matt Stephenson <redacted>

commit | commitdiff | tree

arizhih [Mon, 15 Jul 2024 12:50:36 +0000 (14:50 +0200)]

whisper : fix DTW assert (#2299)

commit | commitdiff | tree

Georgi Gerganov [Tue, 9 Jul 2024 15:54:18 +0000 (18:54 +0300)]

cmake : use WHISPER_EXTRA_FLAGS (#2294)

commit | commitdiff | tree

Borislav Stanimirov [Mon, 8 Jul 2024 14:08:55 +0000 (17:08 +0300)]

cmake : allow external ggml

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 12:36:51 +0000 (15:36 +0300)]

cmake : try to fix openvino build (#2281)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 11:21:04 +0000 (14:21 +0300)]

cmake : remove install of llama convert script [no ci] (#2266)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 11:19:36 +0000 (14:19 +0300)]

make : remove llama prints [no ci] (#2265)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 11:14:17 +0000 (14:14 +0300)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 11:09:09 +0000 (14:09 +0300)]

examples : fix compile warnings [no ci] (#0)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 10:50:28 +0000 (13:50 +0300)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 10:50:14 +0000 (13:50 +0300)]

ggml : sync sycl (skip) (#0)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 10:48:14 +0000 (13:48 +0300)]

scripts : fix sync scripts

commit | commitdiff | tree

Daniel Bevenius [Mon, 8 Jul 2024 10:03:42 +0000 (12:03 +0200)]

ggml : remove unnecessary UNUSED macro call (ggml/880)

This commit removes an UNUSED macro call that is not needed as the
variable n0 is used in the code and will not produce a warning.

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Natsu [Fri, 5 Jul 2024 14:29:35 +0000 (22:29 +0800)]

cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281)

commit | commitdiff | tree

Ouadie EL FAROUKI [Fri, 5 Jul 2024 12:23:25 +0000 (13:23 +0100)]

Enabled more data types for oneMKL gemm_batch (llama/8236)

commit | commitdiff | tree

Johannes Gäßler [Fri, 5 Jul 2024 07:06:31 +0000 (09:06 +0200)]

CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278)

commit | commitdiff | tree

Daniele [Fri, 5 Jul 2024 07:06:09 +0000 (07:06 +0000)]

CUDA: revert part of the RDNA1 optimizations (llama/8309)

The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s

commit | commitdiff | tree

Johannes Gäßler [Fri, 5 Jul 2024 07:05:34 +0000 (09:05 +0200)]

CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311)

commit | commitdiff | tree

luoyu-intel [Fri, 5 Jul 2024 05:06:13 +0000 (05:06 +0000)]

Fix WARP_SIZE=16 bug of Intel GPU (llama/8266)

* fix group_norm ut

* split softmax

* fix softmax

* add concat support condition

* revert debug code

* move QK_WARP_SIZE to presets.hpp

commit | commitdiff | tree

Neo Zhang Jianyu [Fri, 5 Jul 2024 02:32:29 +0000 (10:32 +0800)]

rm get_work_group_size() by local cache for performance (llama/8286)

Co-authored-by: arthw <redacted>

commit | commitdiff | tree

Daniele [Wed, 3 Jul 2024 23:02:58 +0000 (23:02 +0000)]

Define and optimize RDNA1 (llama/8085)

commit | commitdiff | tree

Judd [Wed, 3 Jul 2024 12:40:16 +0000 (20:40 +0800)]

fix typo (llama/8267)

Co-authored-by: Judd <redacted>

commit | commitdiff | tree

Clint Herron [Tue, 2 Jul 2024 16:18:10 +0000 (12:18 -0400)]

Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)

commit | commitdiff | tree

slaren [Tue, 2 Jul 2024 06:39:38 +0000 (08:39 +0200)]

cuda : update supports_op for matrix multiplication (llama/8245)

commit | commitdiff | tree

luoyu-intel [Tue, 2 Jul 2024 04:50:07 +0000 (04:50 +0000)]

Fix win build conflict of math library (llama/8230)

* fix win build conflict of math library

* fix the condition: !(win32 & SYCL)

* revert warp_size=16

commit | commitdiff | tree

luoyu-intel [Tue, 2 Jul 2024 02:16:00 +0000 (02:16 +0000)]

Fix the sub group size of Intel (llama/8106)

* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size

commit | commitdiff | tree

Johannes Gäßler [Mon, 1 Jul 2024 18:39:06 +0000 (20:39 +0200)]

CUDA: refactor and optimize IQ MMVQ (llama/8215)

* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix

commit | commitdiff | tree

zhentaoyu [Mon, 1 Jul 2024 11:39:06 +0000 (19:39 +0800)]

Update SYCL-Rope op and Refactor (llama/8157)

* align with rope.cu and move sycl-op to a single file

commit | commitdiff | tree

Johannes Gäßler [Thu, 27 Jun 2024 14:26:05 +0000 (16:26 +0200)]

CUDA: fix MMQ stream-k for --split-mode row (llama/8167)

commit | commitdiff | tree

John Balis [Tue, 2 Jul 2024 16:09:52 +0000 (11:09 -0500)]

feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854)

* conv transpose 1d passing test for 1d input and kernel

* working for different input and output channel counts, added test for variable stride

* initial draft appears to work with stride other than 1

* working with all old and new conv1d tests

* added a test for large tensors

* removed use cuda hardcoding

* restored test-conv-transpose.c

* removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail

* fixed accumulator bug

* added test to test-backend-ops

* fixed mistake

* addressed review

* fixed includes

* removed blank lines

* style and warning fixes

* return failure when test fails

* fix supports_op

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jul 2024 11:26:59 +0000 (14:26 +0300)]

ci : disable java build

commit | commitdiff | tree

Emmanuel Schmidbauer [Mon, 8 Jul 2024 11:24:58 +0000 (07:24 -0400)]

server : add inference path to make OAI API compatible (#2270)

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 20:20:19 +0000 (23:20 +0300)]

sync : ggml + fix sync script

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 20:20:13 +0000 (23:20 +0300)]

make : disable CUDA graphs

commit | commitdiff | tree

slaren [Wed, 26 Jun 2024 19:34:14 +0000 (21:34 +0200)]

ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140)

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 19:25:25 +0000 (22:25 +0300)]

make : disable CUDA mel build

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 18:42:39 +0000 (21:42 +0300)]

cmake : minor fixes

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 18:20:45 +0000 (21:20 +0300)]

make : fix missing -O3

same as https://github.com/ggerganov/llama.cpp/pull/8143

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 17:11:38 +0000 (20:11 +0300)]

whisper : disable CUDA mel + fix FFMPEG

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 16:40:23 +0000 (19:40 +0300)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Wed, 26 Jun 2024 16:34:09 +0000 (19:34 +0300)]

whisper : reorganize source code + improve CMake (#2256)

* scripts : update sync [no ci]

* files : reorganize [no ci]

* sync : llama.cpp

* cmake : link math library

* cmake : build normal ggml library

* files : move headers to include

* objc : fix path to ggml-metal.h

* ci : fix WHISPER_CUDA -> GGML_CUDA

* scripts : sync LICENSE [no ci]

commit | commitdiff | tree

mky_coder [Tue, 18 Jun 2024 15:10:33 +0000 (23:10 +0800)]

whisper : optimize fft() function (#2242)

Co-authored-by: Mike Fan <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 18 Jun 2024 06:45:37 +0000 (09:45 +0300)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Tue, 18 Jun 2024 06:37:20 +0000 (09:37 +0300)]

whisper : use ggml_backend_sched (#2239)

* whisper : use ggml_backend_sched (wip)

* use sched in whisper_allocr

* whisper : single backend in whisper_context

* whisper : remove whisper_state->backends_used

* whisper : remove whisper_context->backend

* whisper : reset scheduler after init

* whisper : fix external encoder (e.g. CoreML)

* whisper : cleanup

* whisper : handle null GPU buffer types + fix sycl

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 16:23:55 +0000 (19:23 +0300)]

fix : remove extra files

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 16:23:32 +0000 (19:23 +0300)]

scripts : sync ggml-blas

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 16:10:20 +0000 (19:10 +0300)]

build : update make / cmake

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 15:40:07 +0000 (18:40 +0300)]

sync : ggml

commit | commitdiff | tree

slaren [Sun, 16 Jun 2024 10:57:37 +0000 (13:57 +0300)]

move BLAS to a separate backend (cont) (llama/6210)

ggml-ci

commit | commitdiff | tree

0cc4m [Sun, 16 Jun 2024 05:17:31 +0000 (07:17 +0200)]

Vulkan Shader Refactor, Memory Debugging Option (llama/7947)

* Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory

* Improve debug log code

* Add memory debug output option

* Fix flake8

* Fix unnecessary high llama-3 VRAM use

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 15:38:46 +0000 (18:38 +0300)]

scripts : stop sync whisper example from ggml

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 14:57:35 +0000 (17:57 +0300)]

cmake : fix sycl build (#0)

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 10:46:12 +0000 (13:46 +0300)]

ggml : remove OpenCL (#0)

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 10:24:17 +0000 (13:24 +0300)]

sycl : sync (#0)

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 10:20:19 +0000 (13:20 +0300)]

cuda : enable CUDA graphs (#0)

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 10:10:54 +0000 (13:10 +0300)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 10:07:43 +0000 (13:07 +0300)]

cmake : fix CUDA build (#0)

commit | commitdiff | tree

Georgi Gerganov [Sun, 16 Jun 2024 09:43:14 +0000 (12:43 +0300)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Hong Bo PENG [Sun, 16 Jun 2024 08:53:11 +0000 (16:53 +0800)]

ggml : fix and optimize ppc64le (ggml/849)

* fix compile issues introduced by loongarch_asx

* restore quant changes to merge

* fix compile issues introduced by loongarch_asx

* further optimize by using vec_msum & vec_sum4s on ppc64le

commit | commitdiff | tree

Daniel Bevenius [Sun, 16 Jun 2024 08:51:18 +0000 (10:51 +0200)]

ggml : remove duplicate include of ggml-common.h (ggml/853)

Signed-off-by: Daniel Bevenius <redacted>

commit | commitdiff | tree

Meng, Hengyu [Sat, 15 Jun 2024 06:05:10 +0000 (14:05 +0800)]

remove global variables (llama/7710)

* separate DPCT helpers outside

* replace global variables with context

* remove useless extra

* update mul_mat condition

* remove duplicate buft initialization

* remove duplicate extra and global work group size

* remove useless backend check

* remove duplicated extras

* use macro for group_size and remove cuda-related

commit | commitdiff | tree

Johannes Gäßler [Fri, 14 Jun 2024 16:41:49 +0000 (18:41 +0200)]

CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)

* CUDA: faster q2_K, q3_K MMQ + int8 tensor cores

* try CI fix

* try CI fix

* try CI fix

* fix data race

* rever q2_K precision related changes

commit | commitdiff | tree

Georgi Gerganov [Fri, 14 Jun 2024 14:14:09 +0000 (17:14 +0300)]

metal : utilize max shared memory for mul_mat_id (llama/7935)

commit | commitdiff | tree

Radoslav Gerganov [Thu, 13 Jun 2024 12:18:44 +0000 (15:18 +0300)]

rpc : fix ggml_backend_rpc_supports_buft() (llama/7918)

commit | commitdiff | tree

slaren [Thu, 13 Jun 2024 01:11:35 +0000 (03:11 +0200)]

move BLAS to a separate backend (llama/6210)

* move BLAS to a separate backend

* rename GGML_USE_OPENBLAS to GGML_USE_BLAS

* alloc : reuse same buffer when the same buffer type if used multiple times

* set number of threads automatically for openblas and blis

* sched : print assignments when GGML_SCHED_DEBUG env variable is set

* sched : allow ops with weights on an incompatible buffer type

This will cause the weight to be copied to a backend that supports the
op, which is very costly. The weight should have been stored in a buffer
of a backend that can run the op, but llama.cpp cannot do this
automatically at the moment.

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Johannes Gäßler [Wed, 12 Jun 2024 15:41:51 +0000 (17:41 +0200)]

CUDA: fix broken oob check for FA vec f32 kernel (llama/7904)

commit | commitdiff | tree

Georgi Gerganov [Wed, 12 Jun 2024 13:00:22 +0000 (16:00 +0300)]

tests : add non-cont unary tests (llama/7857)

* tests : add non-cont unary tests

* ggml : update unary asserts and "supports_op"

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 12 Jun 2024 12:24:20 +0000 (15:24 +0300)]

ggml : improve ggml_is_contiguous logic (llama/7856)

* ggml : improve ggml_is_contiguous logic

ggml-ci

* ggml : support more contiguous cases

ggml-ci

commit | commitdiff | tree

k.h.lai [Tue, 11 Jun 2024 19:26:05 +0000 (03:26 +0800)]

vulkan: select only one device for single gpu with multiple drivers (llama/7582)

commit | commitdiff | tree

0cc4m [Tue, 11 Jun 2024 19:20:29 +0000 (21:20 +0200)]

Update Vulkan RoPE implementation (llama/7818)

* Update Vulkan RoPE implementation

* Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception

Minor fixes

* Fix segfault when running out of VRAM

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>

Packaging of ggerganov/whisper.cpp

RSS Atom