git.djapps.eu Git - pkg/ggml/sources/ggml/log

]> git.djapps.eu Git - pkg/ggml/sources/ggml/log

overview / pkg / ggml / sources / ggml / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

xctan [Wed, 30 Oct 2024 07:00:40 +0000 (15:00 +0800)]

ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (llama/10029)

* ggml : RISC-V vector gemv for q4_0_8x8

* ggml : Added WIP rvv q4_0_8x8 gemm

* ggml : Added initial implementation of rvv gemm

* ggml : optimize gemm to avoid register spillover

* ggml : Fix GCC rvv load alignment issue

* ggml : Format gemm rvv code

* ggml : Fix a typo in RVV q4_0_8_8 GEMM

commit | commitdiff | tree

Diego Devesa [Wed, 30 Oct 2024 01:01:23 +0000 (02:01 +0100)]

llama : refactor model loader with backend registry (llama/10026)

commit | commitdiff | tree

Changyeon Kim [Tue, 29 Oct 2024 08:52:56 +0000 (17:52 +0900)]

ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (llama/9763)

* ggml: Add POOL2D OP for GPU ACC to the Vulkan.

- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.

Signed-off-by: Changyeon Kim <redacted>
* [fix] Correct the incorrect order of the parameters.

fix casting to int.

Signed-off-by: Changyeon Kim <redacted>
---------

Signed-off-by: Changyeon Kim <redacted>

commit | commitdiff | tree

R0CKSTAR [Mon, 28 Oct 2024 09:02:48 +0000 (17:02 +0800)]

musa: workaround for Guilty Lockup in cleaning src0 (llama/10042)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 4 Nov 2024 08:37:38 +0000 (10:37 +0200)]

scripts : update sync

commit | commitdiff | tree

Yuri Khrustalev [Sat, 2 Nov 2024 09:09:12 +0000 (05:09 -0400)]

cmake : make it possible linking ggml as external lib (#1003)

commit | commitdiff | tree

Plamen Minev [Fri, 1 Nov 2024 14:55:10 +0000 (16:55 +0200)]

metal : fix minor string leaks (#1004)

commit | commitdiff | tree

Georgi Gerganov [Fri, 1 Nov 2024 08:23:32 +0000 (10:23 +0200)]

sync : whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Fri, 1 Nov 2024 08:23:05 +0000 (10:23 +0200)]

ggml : alloc ggml_contexts on the heap (whisper/2525)

commit | commitdiff | tree

Georgi Gerganov [Sat, 26 Oct 2024 06:44:39 +0000 (09:44 +0300)]

ggml : remove sync artifacts

ggml-ci

commit | commitdiff | tree

Ma Mingfei [Sat, 26 Oct 2024 06:43:40 +0000 (09:43 +0300)]

ggml : add AMX backend (llama/8998)

commit | commitdiff | tree

Georgi Gerganov [Sat, 26 Oct 2024 06:40:53 +0000 (09:40 +0300)]

sync : llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Fri, 25 Oct 2024 19:26:15 +0000 (22:26 +0300)]

metal : support permuted matrix multiplicaions (llama/10033)

* metal : support permuted matrix multiplicaions

ggml-ci

* cont : use nb01 directly for row steps

ggml-ci

* cont : add comments [no ci]

* metal : minor refactor

* metal : minor

commit | commitdiff | tree

Johannes Gäßler [Thu, 24 Oct 2024 12:40:23 +0000 (14:40 +0200)]

CUDA: fix insufficient buffer clearing for MMQ (llama/10032)

commit | commitdiff | tree

Johannes Gäßler [Thu, 24 Oct 2024 09:09:36 +0000 (11:09 +0200)]

CUDA: fix MMQ for non-contiguous src0, add tests (llama/10021)

* CUDA: fix MMQ for non-contiguous src0, add tests

* revise test code

commit | commitdiff | tree

Georgi Gerganov [Sat, 26 Oct 2024 06:39:48 +0000 (09:39 +0300)]

scripts : fix sync scripts (amx)

commit | commitdiff | tree

bssrdf [Wed, 23 Oct 2024 18:34:00 +0000 (14:34 -0400)]

increase cuda_cpy block size (#996)

Co-authored-by: bssrdf <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 23 Oct 2024 14:27:08 +0000 (17:27 +0300)]

sync : llama.cpp

commit | commitdiff | tree

Jun Hee Yoo [Wed, 23 Oct 2024 10:33:45 +0000 (19:33 +0900)]

metal : add POOL2D and fix IM2COL (llama/9943)

* add pool_2d

Signed-off-by: Junhee Yoo <redacted>
* fix im2col and add unittest for N>=1024

Signed-off-by: Junhee Yoo <redacted>
* add tests for N % 1024 != 0

Signed-off-by: Junhee Yoo <redacted>
* remove trailing whitespaces

Signed-off-by: Junhee Yoo <redacted>
* apply suggestions

Signed-off-by: Junhee Yoo <redacted>
* apply more optimization

- original IM2COL kernel + _ext with MIN()

Signed-off-by: Junhee Yoo <redacted>
* apply review: change kernel name of pool_2d

Signed-off-by: Junhee Yoo <redacted>
* apply review

Signed-off-by: Junhee Yoo <redacted>
* fix more formatting and enhance readability

Signed-off-by: Junhee Yoo <redacted>
---------

Signed-off-by: Junhee Yoo <redacted>

commit | commitdiff | tree

leo-pony [Tue, 22 Oct 2024 08:16:01 +0000 (16:16 +0800)]

Adapt to dynamically loadable backends mechanism (llama/9970)

* [CANN] Adapt to dynamically loadable backends mechanism

* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class

* Handle the review comments of this pull request

commit | commitdiff | tree

Georgi Gerganov [Mon, 21 Oct 2024 13:20:46 +0000 (16:20 +0300)]

ggml : add asserts for type conversion in fattn kernels (llama/9971)

ggml-ci

commit | commitdiff | tree

Radoslav Gerganov [Mon, 21 Oct 2024 10:35:40 +0000 (13:35 +0300)]

rpc : pack only RPC structs (llama/9959)

commit | commitdiff | tree

Neo Zhang Jianyu [Mon, 21 Oct 2024 06:26:09 +0000 (14:26 +0800)]

fix mul_mat_vec_q and *_vec_q error (llama/9939)

Co-authored-by: arthw <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Fri, 18 Oct 2024 11:33:58 +0000 (14:33 +0300)]

rpc : backend refactoring (llama/9912)

* rpc : refactor backend

Use structs for RPC request/response messages

* rpc : refactor server

commit | commitdiff | tree

Ouadie EL FAROUKI [Fri, 18 Oct 2024 05:46:16 +0000 (06:46 +0100)]

Add SYCL Backend registry, device and Event Interfaces (llama/9705)

* implemented missing SYCL event APIs

* sycl : Added device and backend reg interfaces

* Restructured ggml-sycl.cpp

commit | commitdiff | tree

Ma Mingfei [Fri, 18 Oct 2024 05:34:36 +0000 (13:34 +0800)]

add amx kernel for gemm (llama/8998)

add intel amx isa detection

add vnni kernel for gemv cases

add vnni and amx kernel support for block_q8_0

code cleanup

fix packing B issue

enable openmp

fine tune amx kernel

switch to aten parallel pattern

add error message for nested parallelism

code cleanup

add f16 support in ggml-amx

add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS

update CMakeList

update README

fix some compilation warning

fix compiler warning when amx is not enabled

minor change

ggml-ci

move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp

ggml-ci

update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16

ggml-ci

add amx as an ggml-backend

update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h

minor change

update CMakeLists.txt

minor change

apply weight prepacking in set_tensor method in ggml-backend

fix compile error

ggml-ci

minor change

ggml-ci

update CMakeLists.txt

ggml-ci

add march dependency

minor change

ggml-ci

change ggml_backend_buffer_is_host to return false for amx backend

ggml-ci

fix supports_op

use device reg for AMX backend

ggml-ci

minor change

ggml-ci

minor change

fix rebase

set .buffer_from_host_ptr to be false for AMX backend

commit | commitdiff | tree

Diego Devesa [Thu, 17 Oct 2024 00:46:58 +0000 (02:46 +0200)]

vulkan : add backend registry / device interfaces (llama/9721)

* vulkan : add backend registry / device interfaces

* llama : print devices used on model load

commit | commitdiff | tree

Gilad S [Wed, 16 Oct 2024 23:34:22 +0000 (02:34 +0300)]

fix: allocating CPU buffer with size `0` (llama/9917)

commit | commitdiff | tree

Gilad S [Wed, 16 Oct 2024 22:36:51 +0000 (01:36 +0300)]

fix: use `vm_allocate` to allocate CPU backend buffer on macOS (llama/9875)

* fix: use `vm_allocate` to allocate CPU backend buffer on macOS

* fix: switch to `posix_memalign` to keep existing `free()` usages work

* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS

* style: formatting

* fix: move const outside of `#ifndef`

* style: formatting

* fix: unused var

* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`

* fix: unused var

* fix: page align to `GGUF_DEFAULT_ALIGNMENT`

* fix: page align to `TENSOR_ALIGNMENT`

* fix: convert `TENSOR_ALIGNMENT` to a macro

* fix: increase page size to `32` on iOS

* fix: iOS page size

* fix: `hbw_posix_memalign` alignment

commit | commitdiff | tree

Johannes Gäßler [Fri, 18 Oct 2024 07:24:44 +0000 (09:24 +0200)]

CUDA: fix 1D im2col, add tests (#993)

commit | commitdiff | tree

Daniel Bevenius [Wed, 16 Oct 2024 18:10:01 +0000 (20:10 +0200)]

ggml : remove redundant set of contexts used field (#978)

This commit removes the setting of the `used` field of the contexts in
the global state (g_state) in `ggml_init`.

The motivation for this change is that I believe that this additional
initialization might not be required after the changes in Commit
45fc4fed0b9fb5b1af4a8525cbebb95e11208732 ("sync : latest changes from
whisper.cpp"), which changed the initialization of the contexts field
from `{ 0 }` to `{ { 0 } }`:

```console
             g_state = (struct ggml_state) {
-                /*.contexts =*/ { 0 },
+                /*.contexts =*/ { { 0 } },
             };
```
My understanding is that the `{0}` initialization might not have
zero-initialized all the nested fields in every array element because of
compiler differences, and might have been the reason for having the
explicit setting of the `used` fields to false.

commit | commitdiff | tree

Georgi Gerganov [Wed, 16 Oct 2024 08:39:35 +0000 (11:39 +0300)]

tests : update type traits call (#0)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 16 Oct 2024 08:28:53 +0000 (11:28 +0300)]

sync : llama.cpp

commit | commitdiff | tree

leo-pony [Wed, 16 Oct 2024 00:51:46 +0000 (08:51 +0800)]

Fix cann compilation error (llama/9891)

Fix cann compilation error after merging llama.cpp supports dynamically loadable backends.

commit | commitdiff | tree

agray3 [Mon, 14 Oct 2024 00:49:08 +0000 (01:49 +0100)]

Vectorize load instructions in dmmv f16 CUDA kernel (llama/9816)

* Vectorize load instructions in dmmv f16 CUDA kernel

Replaces scalar with vector load instructions, which substantially
improves performance on NVIDIA HBM GPUs, e.g. gives a 1.27X overall
speedup for Meta-Llama-3-8B-Instruct-F16 BS1 inference evaluation on
H100 SXM 80GB HBM3. On GDDR GPUs, there is a slight (1.01X) speedup.

* addressed comment

* Update ggml/src/ggml-cuda/dmmv.cu

Co-authored-by: Johannes Gäßler <redacted>
---------

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Diego Devesa [Fri, 11 Oct 2024 13:34:45 +0000 (15:34 +0200)]

ggml : move more prints to the ggml log system (llama/9839)

* ggml : move more prints to the ggml log system

* show BLAS OpenMP warnings in all builds using debug print

commit | commitdiff | tree

Diego Devesa [Thu, 10 Oct 2024 18:14:55 +0000 (20:14 +0200)]

rpc : add backend registry / device interfaces (llama/9812)

* rpc : add backend registry / device interfaces

* llama : add llama_supports_rpc API

* ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server

commit | commitdiff | tree

R0CKSTAR [Thu, 10 Oct 2024 18:10:37 +0000 (02:10 +0800)]

musa: add docker image support (llama/9685)

* mtgpu: add docker image support

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: enable docker workflow

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Diego Devesa [Tue, 8 Oct 2024 12:21:43 +0000 (14:21 +0200)]

ggml : fix BLAS with unsupported types (llama/9775)

* ggml : do not use BLAS with types without to_float

* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies

* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits

it's not really internal if everybody uses it

commit | commitdiff | tree

Diego Devesa [Mon, 7 Oct 2024 19:55:08 +0000 (21:55 +0200)]

ggml : add backend registry / device interfaces to BLAS backend (llama/9752)

* ggml : add backend registry / device interfaces to BLAS backend

* fix mmap usage when using host buffers

commit | commitdiff | tree

Andrew Minh Nguyen [Mon, 7 Oct 2024 16:37:31 +0000 (09:37 -0700)]

Update building for Android (llama/9672)

* docs : clarify building Android on Termux

* docs : update building Android on Termux

* docs : add cross-compiling for Android

* cmake : link dl explicitly for Android

commit | commitdiff | tree

Georgi Gerganov [Mon, 7 Oct 2024 15:27:51 +0000 (18:27 +0300)]

ggml : add metal backend registry / device (llama/9713)

* ggml : add metal backend registry / device

ggml-ci

* metal : fix names [no ci]

* metal : global registry and device instances

ggml-ci

* cont : alternative initialization of global objects

ggml-ci

* llama : adapt to backend changes

ggml-ci

* fixes

* metal : fix indent

* metal : fix build when MTLGPUFamilyApple3 is not available

ggml-ci

* fix merge

* metal : avoid unnecessary singleton accesses

ggml-ci

* metal : minor fix [no ci]

* metal : g_state -> g_ggml_ctx_dev_main [no ci]

* metal : avoid reference of device context in the backend context

ggml-ci

* metal : minor [no ci]

* metal : fix maxTransferRate check

* metal : remove transfer rate stuff

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Paul Tsochantaris [Mon, 7 Oct 2024 12:26:31 +0000 (13:26 +0100)]

metal : single allocation of encode_async block (llama/9747)

* Single allocation of encode_async block with non-ARC capture in ggml-metal.m

* Moving Block_release to the deallocation code

* Release encode block when re-setting encoding buffer count if needed

* Update ggml/src/ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Daniel Bevenius [Wed, 9 Oct 2024 14:40:35 +0000 (16:40 +0200)]

ggml-alloc : remove buffer_id from leaf_alloc (#987)

This commit removes the buffer_id field from the leaf_alloc struct.

The motivation for is that this field is only written to and never
read/used as far as I can tell. Each tensor_alloc has a buffer_id field
and this is what caused me to look into this more closely, to
understand what the buffer_id in leaf_alloc was used for.

commit | commitdiff | tree

Georgi Gerganov [Sun, 6 Oct 2024 09:52:42 +0000 (12:52 +0300)]

zig : remove obsolete build script

commit | commitdiff | tree

Georgi Gerganov [Sun, 6 Oct 2024 09:51:58 +0000 (12:51 +0300)]

sync : whisper.cpp

commit | commitdiff | tree

SRHMorris [Sun, 6 Oct 2024 07:34:20 +0000 (08:34 +0100)]

vulkan : retry allocation with fallback flags (whisper/2451)

Co-authored-by: Samuel Morris <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 6 Oct 2024 09:51:30 +0000 (12:51 +0300)]

spm : update backend.c -> backend.cpp

commit | commitdiff | tree

Johannes Gäßler [Sat, 5 Oct 2024 16:38:01 +0000 (18:38 +0200)]

examples: add dataset, data shuffling to MNIST (#982)

commit | commitdiff | tree

Georgi Gerganov [Sat, 5 Oct 2024 12:52:36 +0000 (15:52 +0300)]

sync : whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Sat, 5 Oct 2024 11:33:54 +0000 (14:33 +0300)]

metal : zero-init buffer contexts (whisper/0)

commit | commitdiff | tree

Georgi Gerganov [Fri, 4 Oct 2024 15:54:31 +0000 (18:54 +0300)]

sync : llama.cpp

commit | commitdiff | tree

Daniel Bevenius [Fri, 4 Oct 2024 13:46:18 +0000 (15:46 +0200)]

ggml : fix typo in example usage ggml_gallocr_new (#984)

commit | commitdiff | tree

Diego Devesa [Fri, 4 Oct 2024 06:41:40 +0000 (08:41 +0200)]

ggml : fixes after sync (#983)

ggml : remove test-backend-buffer

ggml : fix CUDA build warnings

commit | commitdiff | tree

Georgi Gerganov [Thu, 3 Oct 2024 19:18:03 +0000 (22:18 +0300)]

sync : whisper.cpp

commit | commitdiff | tree

Georgi Gerganov [Thu, 3 Oct 2024 19:11:21 +0000 (22:11 +0300)]

ggml : remove old file (skip) (#0)

commit | commitdiff | tree

Georgi Gerganov [Thu, 3 Oct 2024 19:03:05 +0000 (22:03 +0300)]

cont : fixes

commit | commitdiff | tree

Georgi Gerganov [Thu, 3 Oct 2024 18:42:03 +0000 (21:42 +0300)]

examples : adapt to new ggml backend interfaces

ggml-ci

commit | commitdiff | tree

Diego Devesa [Thu, 3 Oct 2024 18:25:11 +0000 (21:25 +0300)]

ggml-backend : add device and backend reg interfaces (llama/9707)

Also:

- metal : fix compute pass descriptor autorelease crash
- ggml-backend : add device description to CPU backend
- ggml: unify backend logging mechanism

commit | commitdiff | tree

Georgi Gerganov [Thu, 3 Oct 2024 18:21:40 +0000 (21:21 +0300)]

sync : llama.cpp

commit | commitdiff | tree

Ouadie EL FAROUKI [Thu, 3 Oct 2024 06:50:44 +0000 (07:50 +0100)]

Fixed dequant precision issues in Q4_1 and Q5_1 (llama/9711)

commit | commitdiff | tree

Diego Devesa [Wed, 2 Oct 2024 23:49:47 +0000 (01:49 +0200)]

ggml-backend : add device and backend reg interfaces (llama/9707)

Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Alberto Cabrera Pérez [Wed, 2 Oct 2024 12:57:18 +0000 (13:57 +0100)]

Initial cmake support of SYCL for AMD GPUs (llama/9658)

sycl: initial cmake support of SYCL for AMD GPUs

commit | commitdiff | tree

Radoslav Gerganov [Wed, 2 Oct 2024 10:49:16 +0000 (13:49 +0300)]

vulkan : do not use tensor->extra (llama/9407)

* vulkan : do not use tensor->extra

This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.

Ref: #8536

* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (llama/2)

---------

Co-authored-by: 0cc4m <redacted>

commit | commitdiff | tree

Johannes Gäßler [Thu, 3 Oct 2024 15:29:59 +0000 (17:29 +0200)]

ggml/ex: calculate accuracy in graph, adapt MNIST (#980)

commit | commitdiff | tree

Johannes Gäßler [Wed, 2 Oct 2024 13:32:39 +0000 (15:32 +0200)]

ggml: refactor cross entropy loss CPU impl. (#976)

commit | commitdiff | tree

Georgi Gerganov [Tue, 1 Oct 2024 15:33:35 +0000 (18:33 +0300)]

readme : refresh

commit | commitdiff | tree

Georgi Gerganov [Tue, 1 Oct 2024 15:08:31 +0000 (18:08 +0300)]

metal : add perf-metal tool + fix build

commit | commitdiff | tree

Georgi Gerganov [Tue, 1 Oct 2024 13:10:45 +0000 (16:10 +0300)]

metal : reduce command encoding overhead (llama/9698)

ggml-ci

commit | commitdiff | tree

Johannes Gäßler [Mon, 30 Sep 2024 07:55:23 +0000 (09:55 +0200)]

test: fix OPT_STEP_ADAMW for test-backend-ops (#974)

commit | commitdiff | tree

Salvatore Mesoraca [Mon, 30 Sep 2024 07:14:09 +0000 (09:14 +0200)]

vulkan : mul_mat: fix UB with small warps (#952)

When the device's warp size is less than 16,
it is possible for loadstride_a (mul_mm.comp:114)
and loadstride_b (mul_mm.comp:115) to be set to 0.
Because they are calculated as: the workgroup size,
multiplied by LOAD_VEC_* (which can be 1) and divided by 16.
And the workgroup size is set to be the same as the
warp/subgroup size.

The loadstride_* variables are used as increments in the
loops that populate the buffers used for the multiplication.

When they are 0 they cause an infinite loop.
But infinite loops without side-effects are UB and the
values of loadstride_* are known at compile time.
So, the compiler quietly optimizes all the loops away.
As a consequence, the buffers are not populated and
the multiplication result is just a matrix with all elements
set to 0.

We prevent the UB by making sure that the workgroup size
will never be less than 16, even if our device has a
smaller warp size (e.g. 8).

Signed-off-by: Salvatore Mesoraca <redacted>

commit | commitdiff | tree

Borislav Stanimirov [Mon, 30 Sep 2024 07:11:41 +0000 (10:11 +0300)]

ggml : fix ggml_cast (#973)

commit | commitdiff | tree

Johannes Gäßler [Sun, 29 Sep 2024 21:18:02 +0000 (23:18 +0200)]

ggml: fix gradient allocation logic (#966)

* ggml: fix gradient allocation logic

* gradient allocation in ggml_build_backward_expand

* fixup

* fix test-backend-ops grad

* suggestions by slaren

* fix test1.c

* fix legacy opt API

* fix test-grad0

* remove keep arg

commit | commitdiff | tree

Georgi Gerganov [Sun, 29 Sep 2024 18:53:33 +0000 (21:53 +0300)]

sync : llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Sun, 29 Sep 2024 18:18:23 +0000 (21:18 +0300)]

ggml : define missing HWCAP flags (llama/9684)

ggml-ci

Co-authored-by: Willy Tarreau <redacted>

commit | commitdiff | tree

slaren [Sat, 28 Sep 2024 12:32:46 +0000 (14:32 +0200)]

test-backend-ops : use flops for some performance tests (llama/9657)

* test-backend-ops : use flops for some performance tests

- parallelize tensor quantization

- use a different set of cases for performance and correctness tests

- run each test for at least one second

commit | commitdiff | tree

Dan Johansson [Sat, 28 Sep 2024 12:06:16 +0000 (14:06 +0200)]

ggml : add run-time detection of neon, i8mm and sve (llama/9331)

* ggml: Added run-time detection of neon, i8mm and sve

Adds run-time detection of the Arm instructions set features
neon, i8mm and sve for Linux and Apple build targets.

* ggml: Extend feature detection to include non aarch64 Arm arch

* ggml: Move definition of ggml_arm_arch_features to the global data section

commit | commitdiff | tree

Markus Tavenrath [Sat, 28 Sep 2024 10:05:05 +0000 (12:05 +0200)]

Enable use to the rebar feature to upload buffers to the device. (llama/9251)

commit | commitdiff | tree

R0CKSTAR [Thu, 26 Sep 2024 01:27:40 +0000 (09:27 +0800)]

mtgpu: enable VMM (llama/9597)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Charles Xu [Wed, 25 Sep 2024 13:12:20 +0000 (15:12 +0200)]

ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (llama/9217)

* ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels

* added fallback mechanism when the offline re-quantized model is not
optimized for the underlying target.

* fix for build errors

* remove prints from the low-level code

* Rebase to the latest upstream

commit | commitdiff | tree

Dou Xinpeng [Wed, 25 Sep 2024 03:30:38 +0000 (11:30 +0800)]

cann: fix crash when llama-bench is running on multiple cann devices (llama/9627)

commit | commitdiff | tree

Johannes Gäßler [Sun, 29 Sep 2024 17:56:17 +0000 (19:56 +0200)]

CUDA: remove bad assert (#972)

commit | commitdiff | tree

Jeff Bolz [Sun, 29 Sep 2024 16:50:17 +0000 (11:50 -0500)]

vulkan : multithread pipeline creation (#963)

commit | commitdiff | tree

Jeff Bolz [Fri, 27 Sep 2024 07:58:01 +0000 (02:58 -0500)]

vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (#961)

commit | commitdiff | tree

Salvatore Mesoraca [Thu, 26 Sep 2024 06:59:42 +0000 (08:59 +0200)]

vulkan : argsort barriers must be under uniform control flow (#951)

a return before a barrier (that happens only in some threads in
a workgroup) leads to UB.
While the old code actually works on some devices,
it fails on some others (i.e. "smaller" GPUs).

BTW, I think it would be better to set specialization constants
when the graph is built, in that way the local workgroup
could be sized appropriately.
But it would take a lot of work.

Signed-off-by: Salvatore Mesoraca <redacted>

commit | commitdiff | tree

Georgi Gerganov [Tue, 24 Sep 2024 10:23:59 +0000 (13:23 +0300)]

ggml : fix GGML_MAX_N_THREADS + improve formatting (#969)

commit | commitdiff | tree

Georgi Gerganov [Tue, 24 Sep 2024 08:04:31 +0000 (11:04 +0300)]

sync : llama.cpp

ggml-ci

commit | commitdiff | tree

Eric Zhang [Tue, 24 Sep 2024 08:03:21 +0000 (16:03 +0800)]

ggml : add AVX512DQ requirement for AVX512 builds (llama/9622)

commit | commitdiff | tree

Georgi Gerganov [Tue, 24 Sep 2024 07:15:35 +0000 (10:15 +0300)]

log : add CONT level for continuing previous log entry (llama/9610)

commit | commitdiff | tree

Max Krasnyansky [Tue, 24 Sep 2024 04:18:48 +0000 (21:18 -0700)]

threads: fix msvc build without openmp (llama/9615)

We're missing atomic_thread_fence() in MSVC builds when openmp is disabled.

commit | commitdiff | tree

Ivan [Tue, 24 Sep 2024 00:14:24 +0000 (03:14 +0300)]

cuda: add q8_0->f32 cpy operation (llama/9571)

llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.

commit | commitdiff | tree

Max Krasnyansky [Mon, 23 Sep 2024 18:42:43 +0000 (11:42 -0700)]

threads: improve ggml_barrier scaling with large number of threads (llama/9598)

Make sure n_barrier and n_barrier_passed do not share the cache line to avoid cache line bouncing.
This optimization shows performance improvements even for n_threads <= 8 cases.

Resurect TSAN (Thread Sanitizer) check so that we can avoid doing expensive read-modify-write
in the normal case and just use thread-fence as originally intended.

commit | commitdiff | tree

Srihari-mcw [Mon, 23 Sep 2024 14:06:38 +0000 (19:36 +0530)]

ggml : AVX512 gemm for Q4_0_8_8 (llama/9532)

* AVX512 version of ggml_gemm_q4_0_8x8_q8_0

* Remove zero vector parameter passing

* Rename functions and rearrange order of macros

* Edit commments

* style : minor adjustments

* Update x to start from 0

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 23 Sep 2024 08:27:47 +0000 (11:27 +0300)]

metal : use F32 prec for K*Q in vec FA (llama/9595)

ggml-ci

commit | commitdiff | tree

Akarshan Biswas [Mon, 23 Sep 2024 03:28:06 +0000 (08:58 +0530)]

Revert "[SYCL] fallback mmvq (#9088)" (llama/9579)

This reverts commit 50addec9a532a6518146ab837a85504850627316.

commit | commitdiff | tree

R0CKSTAR [Sun, 22 Sep 2024 14:55:49 +0000 (22:55 +0800)]

musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526)

* mtgpu: add mp_21 support

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: enable unified memory

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest)

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Molly Sophia [Sun, 22 Sep 2024 13:26:50 +0000 (21:26 +0800)]

Fix merge error in #9454 (llama/9589)

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

Johannes Gäßler [Sun, 22 Sep 2024 07:34:52 +0000 (09:34 +0200)]

CUDA: enable Gemma FA for HIP/Pascal (llama/9581)

commit | commitdiff | tree

Molly Sophia [Sun, 22 Sep 2024 02:29:12 +0000 (10:29 +0800)]

RWKV v6: RWKV_WKV op CUDA implementation (llama/9454)

* ggml: CUDA unary op EXP

Signed-off-by: Molly Sophia <redacted>
* ggml: rwkv_wkv op CUDA impl

Signed-off-by: Molly Sophia <redacted>
---------

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

slaren [Sat, 21 Sep 2024 12:24:23 +0000 (14:24 +0200)]

ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (llama/9573)

Packaging of ggml-org/ggml