]> git.djapps.eu Git - pkg/ggml/sources/ggml/log
pkg/ggml/sources/ggml
8 months agoUpdate building for Android (llama/9672)
Andrew Minh Nguyen [Mon, 7 Oct 2024 16:37:31 +0000 (09:37 -0700)]
Update building for Android (llama/9672)

* docs : clarify building Android on Termux

* docs : update building Android on Termux

* docs : add cross-compiling for Android

* cmake : link dl explicitly for Android

8 months agoggml : add metal backend registry / device (llama/9713)
Georgi Gerganov [Mon, 7 Oct 2024 15:27:51 +0000 (18:27 +0300)]
ggml : add metal backend registry / device (llama/9713)

* ggml : add metal backend registry / device

ggml-ci

* metal : fix names [no ci]

* metal : global registry and device instances

ggml-ci

* cont : alternative initialization of global objects

ggml-ci

* llama : adapt to backend changes

ggml-ci

* fixes

* metal : fix indent

* metal : fix build when MTLGPUFamilyApple3 is not available

ggml-ci

* fix merge

* metal : avoid unnecessary singleton accesses

ggml-ci

* metal : minor fix [no ci]

* metal : g_state -> g_ggml_ctx_dev_main [no ci]

* metal : avoid reference of device context in the backend context

ggml-ci

* metal : minor [no ci]

* metal : fix maxTransferRate check

* metal : remove transfer rate stuff

---------

Co-authored-by: slaren <redacted>
8 months agometal : single allocation of encode_async block (llama/9747)
Paul Tsochantaris [Mon, 7 Oct 2024 12:26:31 +0000 (13:26 +0100)]
metal : single allocation of encode_async block (llama/9747)

* Single allocation of encode_async block with non-ARC capture in ggml-metal.m

* Moving Block_release to the deallocation code

* Release encode block when re-setting encoding buffer count if needed

* Update ggml/src/ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <redacted>
8 months agoggml-alloc : remove buffer_id from leaf_alloc (#987)
Daniel Bevenius [Wed, 9 Oct 2024 14:40:35 +0000 (16:40 +0200)]
ggml-alloc : remove buffer_id from leaf_alloc (#987)

This commit removes the buffer_id field from the leaf_alloc struct.

The motivation for is that this field is only written to and never
read/used as far as I can tell. Each tensor_alloc has a buffer_id field
and this is what caused me to look into this more closely, to
understand what the buffer_id in leaf_alloc was used for.

8 months agozig : remove obsolete build script
Georgi Gerganov [Sun, 6 Oct 2024 09:52:42 +0000 (12:52 +0300)]
zig : remove obsolete build script

8 months agosync : whisper.cpp
Georgi Gerganov [Sun, 6 Oct 2024 09:51:58 +0000 (12:51 +0300)]
sync : whisper.cpp

8 months agovulkan : retry allocation with fallback flags (whisper/2451)
SRHMorris [Sun, 6 Oct 2024 07:34:20 +0000 (08:34 +0100)]
vulkan : retry allocation with fallback flags (whisper/2451)

Co-authored-by: Samuel Morris <redacted>
8 months agospm : update backend.c -> backend.cpp
Georgi Gerganov [Sun, 6 Oct 2024 09:51:30 +0000 (12:51 +0300)]
spm : update backend.c -> backend.cpp

8 months agoexamples: add dataset, data shuffling to MNIST (#982)
Johannes Gäßler [Sat, 5 Oct 2024 16:38:01 +0000 (18:38 +0200)]
examples: add dataset, data shuffling to MNIST (#982)

8 months agosync : whisper.cpp
Georgi Gerganov [Sat, 5 Oct 2024 12:52:36 +0000 (15:52 +0300)]
sync : whisper.cpp

8 months agometal : zero-init buffer contexts (whisper/0)
Georgi Gerganov [Sat, 5 Oct 2024 11:33:54 +0000 (14:33 +0300)]
metal : zero-init buffer contexts (whisper/0)

8 months agosync : llama.cpp
Georgi Gerganov [Fri, 4 Oct 2024 15:54:31 +0000 (18:54 +0300)]
sync : llama.cpp

8 months agoggml : fix typo in example usage ggml_gallocr_new (#984)
Daniel Bevenius [Fri, 4 Oct 2024 13:46:18 +0000 (15:46 +0200)]
ggml : fix typo in example usage ggml_gallocr_new (#984)

8 months agoggml : fixes after sync (#983)
Diego Devesa [Fri, 4 Oct 2024 06:41:40 +0000 (08:41 +0200)]
ggml : fixes after sync (#983)

ggml : remove test-backend-buffer

ggml : fix CUDA build warnings

8 months agosync : whisper.cpp
Georgi Gerganov [Thu, 3 Oct 2024 19:18:03 +0000 (22:18 +0300)]
sync : whisper.cpp

8 months agoggml : remove old file (skip) (#0)
Georgi Gerganov [Thu, 3 Oct 2024 19:11:21 +0000 (22:11 +0300)]
ggml : remove old file (skip) (#0)

8 months agocont : fixes
Georgi Gerganov [Thu, 3 Oct 2024 19:03:05 +0000 (22:03 +0300)]
cont : fixes

8 months agoexamples : adapt to new ggml backend interfaces
Georgi Gerganov [Thu, 3 Oct 2024 18:42:03 +0000 (21:42 +0300)]
examples : adapt to new ggml backend interfaces

ggml-ci

8 months agoggml-backend : add device and backend reg interfaces (llama/9707)
Diego Devesa [Thu, 3 Oct 2024 18:25:11 +0000 (21:25 +0300)]
ggml-backend : add device and backend reg interfaces (llama/9707)

Also:

- metal : fix compute pass descriptor autorelease crash
- ggml-backend : add device description to CPU backend
- ggml: unify backend logging mechanism

8 months agosync : llama.cpp
Georgi Gerganov [Thu, 3 Oct 2024 18:21:40 +0000 (21:21 +0300)]
sync : llama.cpp

8 months agoFixed dequant precision issues in Q4_1 and Q5_1 (llama/9711)
Ouadie EL FAROUKI [Thu, 3 Oct 2024 06:50:44 +0000 (07:50 +0100)]
Fixed dequant precision issues in Q4_1 and Q5_1 (llama/9711)

8 months agoggml-backend : add device and backend reg interfaces (llama/9707)
Diego Devesa [Wed, 2 Oct 2024 23:49:47 +0000 (01:49 +0200)]
ggml-backend : add device and backend reg interfaces (llama/9707)

Co-authored-by: Johannes Gäßler <redacted>
8 months agoInitial cmake support of SYCL for AMD GPUs (llama/9658)
Alberto Cabrera Pérez [Wed, 2 Oct 2024 12:57:18 +0000 (13:57 +0100)]
Initial cmake support of SYCL for AMD GPUs (llama/9658)

sycl: initial cmake support of SYCL for AMD GPUs

8 months agovulkan : do not use tensor->extra (llama/9407)
Radoslav Gerganov [Wed, 2 Oct 2024 10:49:16 +0000 (13:49 +0300)]
vulkan : do not use tensor->extra (llama/9407)

* vulkan : do not use tensor->extra

This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.

Ref: #8536

* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (llama/2)

---------

Co-authored-by: 0cc4m <redacted>
8 months agoggml/ex: calculate accuracy in graph, adapt MNIST (#980)
Johannes Gäßler [Thu, 3 Oct 2024 15:29:59 +0000 (17:29 +0200)]
ggml/ex: calculate accuracy in graph, adapt MNIST (#980)

8 months agoggml: refactor cross entropy loss CPU impl. (#976)
Johannes Gäßler [Wed, 2 Oct 2024 13:32:39 +0000 (15:32 +0200)]
ggml: refactor cross entropy loss CPU impl. (#976)

8 months agoreadme : refresh
Georgi Gerganov [Tue, 1 Oct 2024 15:33:35 +0000 (18:33 +0300)]
readme : refresh

8 months agometal : add perf-metal tool + fix build
Georgi Gerganov [Tue, 1 Oct 2024 15:08:31 +0000 (18:08 +0300)]
metal : add perf-metal tool + fix build

8 months agometal : reduce command encoding overhead (llama/9698)
Georgi Gerganov [Tue, 1 Oct 2024 13:10:45 +0000 (16:10 +0300)]
metal : reduce command encoding overhead (llama/9698)

ggml-ci

8 months agotest: fix OPT_STEP_ADAMW for test-backend-ops (#974)
Johannes Gäßler [Mon, 30 Sep 2024 07:55:23 +0000 (09:55 +0200)]
test: fix OPT_STEP_ADAMW for test-backend-ops (#974)

8 months agovulkan : mul_mat: fix UB with small warps (#952)
Salvatore Mesoraca [Mon, 30 Sep 2024 07:14:09 +0000 (09:14 +0200)]
vulkan : mul_mat: fix UB with small warps (#952)

When the device's warp size is less than 16,
it is possible for loadstride_a (mul_mm.comp:114)
and loadstride_b (mul_mm.comp:115) to be set to 0.
Because they are calculated as: the workgroup size,
multiplied by LOAD_VEC_* (which can be 1) and divided by 16.
And the workgroup size is set to be the same as the
warp/subgroup size.

The loadstride_* variables are used as increments in the
loops that populate the buffers used for the multiplication.

When they are 0 they cause an infinite loop.
But infinite loops without side-effects are UB and the
values of loadstride_* are known at compile time.
So, the compiler quietly optimizes all the loops away.
As a consequence, the buffers are not populated and
the multiplication result is just a matrix with all elements
set to 0.

We prevent the UB by making sure that the workgroup size
will never be less than 16, even if our device has a
smaller warp size (e.g. 8).

Signed-off-by: Salvatore Mesoraca <redacted>
8 months agoggml : fix ggml_cast (#973)
Borislav Stanimirov [Mon, 30 Sep 2024 07:11:41 +0000 (10:11 +0300)]
ggml : fix ggml_cast (#973)

8 months agoggml: fix gradient allocation logic (#966)
Johannes Gäßler [Sun, 29 Sep 2024 21:18:02 +0000 (23:18 +0200)]
ggml: fix gradient allocation logic (#966)

* ggml: fix gradient allocation logic

* gradient allocation in ggml_build_backward_expand

* fixup

* fix test-backend-ops grad

* suggestions by slaren

* fix test1.c

* fix legacy opt API

* fix test-grad0

* remove keep arg

8 months agosync : llama.cpp
Georgi Gerganov [Sun, 29 Sep 2024 18:53:33 +0000 (21:53 +0300)]
sync : llama.cpp

8 months agoggml : define missing HWCAP flags (llama/9684)
Georgi Gerganov [Sun, 29 Sep 2024 18:18:23 +0000 (21:18 +0300)]
ggml : define missing HWCAP flags (llama/9684)

ggml-ci

Co-authored-by: Willy Tarreau <redacted>
8 months agotest-backend-ops : use flops for some performance tests (llama/9657)
slaren [Sat, 28 Sep 2024 12:32:46 +0000 (14:32 +0200)]
test-backend-ops : use flops for some performance tests (llama/9657)

* test-backend-ops : use flops for some performance tests

- parallelize tensor quantization

- use a different set of cases for performance and correctness tests

- run each test for at least one second

8 months agoggml : add run-time detection of neon, i8mm and sve (llama/9331)
Dan Johansson [Sat, 28 Sep 2024 12:06:16 +0000 (14:06 +0200)]
ggml : add run-time detection of neon, i8mm and sve (llama/9331)

* ggml: Added run-time detection of neon, i8mm and sve

Adds run-time detection of the Arm instructions set features
neon, i8mm and sve for Linux and Apple build targets.

* ggml: Extend feature detection to include non aarch64 Arm arch

* ggml: Move definition of ggml_arm_arch_features to the global data section

8 months agoEnable use to the rebar feature to upload buffers to the device. (llama/9251)
Markus Tavenrath [Sat, 28 Sep 2024 10:05:05 +0000 (12:05 +0200)]
Enable use to the rebar feature to upload buffers to the device. (llama/9251)

8 months agomtgpu: enable VMM (llama/9597)
R0CKSTAR [Thu, 26 Sep 2024 01:27:40 +0000 (09:27 +0800)]
mtgpu: enable VMM (llama/9597)

Signed-off-by: Xiaodong Ye <redacted>
8 months agoggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (llama/9217)
Charles Xu [Wed, 25 Sep 2024 13:12:20 +0000 (15:12 +0200)]
ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (llama/9217)

* ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels

* added fallback mechanism when the offline re-quantized model is not
optimized for the underlying target.

* fix for build errors

* remove prints from the low-level code

* Rebase to the latest upstream

8 months agocann: fix crash when llama-bench is running on multiple cann devices (llama/9627)
Dou Xinpeng [Wed, 25 Sep 2024 03:30:38 +0000 (11:30 +0800)]
cann: fix crash when llama-bench is running on multiple cann devices (llama/9627)

8 months agoCUDA: remove bad assert (#972)
Johannes Gäßler [Sun, 29 Sep 2024 17:56:17 +0000 (19:56 +0200)]
CUDA: remove bad assert (#972)

8 months agovulkan : multithread pipeline creation (#963)
Jeff Bolz [Sun, 29 Sep 2024 16:50:17 +0000 (11:50 -0500)]
vulkan : multithread pipeline creation (#963)

9 months agovulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (#961)
Jeff Bolz [Fri, 27 Sep 2024 07:58:01 +0000 (02:58 -0500)]
vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (#961)

9 months agovulkan : argsort barriers must be under uniform control flow (#951)
Salvatore Mesoraca [Thu, 26 Sep 2024 06:59:42 +0000 (08:59 +0200)]
vulkan : argsort barriers must be under uniform control flow (#951)

a return before a barrier (that happens only in some threads in
a workgroup) leads to UB.
While the old code actually works on some devices,
it fails on some others (i.e. "smaller" GPUs).

BTW, I think it would be better to set specialization constants
when the graph is built, in that way the local workgroup
could be sized appropriately.
But it would take a lot of work.

Signed-off-by: Salvatore Mesoraca <redacted>
9 months agoggml : fix GGML_MAX_N_THREADS + improve formatting (#969)
Georgi Gerganov [Tue, 24 Sep 2024 10:23:59 +0000 (13:23 +0300)]
ggml : fix GGML_MAX_N_THREADS + improve formatting (#969)

9 months agosync : llama.cpp
Georgi Gerganov [Tue, 24 Sep 2024 08:04:31 +0000 (11:04 +0300)]
sync : llama.cpp

ggml-ci

9 months agoggml : add AVX512DQ requirement for AVX512 builds (llama/9622)
Eric Zhang [Tue, 24 Sep 2024 08:03:21 +0000 (16:03 +0800)]
ggml : add AVX512DQ requirement for AVX512 builds (llama/9622)

9 months agolog : add CONT level for continuing previous log entry (llama/9610)
Georgi Gerganov [Tue, 24 Sep 2024 07:15:35 +0000 (10:15 +0300)]
log : add CONT level for continuing previous log entry (llama/9610)

9 months agothreads: fix msvc build without openmp (llama/9615)
Max Krasnyansky [Tue, 24 Sep 2024 04:18:48 +0000 (21:18 -0700)]
threads: fix msvc build without openmp (llama/9615)

We're missing atomic_thread_fence() in MSVC builds when openmp is disabled.

9 months agocuda: add q8_0->f32 cpy operation (llama/9571)
Ivan [Tue, 24 Sep 2024 00:14:24 +0000 (03:14 +0300)]
cuda: add q8_0->f32 cpy operation (llama/9571)

llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.

9 months agothreads: improve ggml_barrier scaling with large number of threads (llama/9598)
Max Krasnyansky [Mon, 23 Sep 2024 18:42:43 +0000 (11:42 -0700)]
threads: improve ggml_barrier scaling with large number of threads (llama/9598)

Make sure n_barrier and n_barrier_passed do not share the cache line to avoid cache line bouncing.
This optimization shows performance improvements even for n_threads <= 8 cases.

Resurect TSAN (Thread Sanitizer) check so that we can avoid doing expensive read-modify-write
in the normal case and just use thread-fence as originally intended.

9 months agoggml : AVX512 gemm for Q4_0_8_8 (llama/9532)
Srihari-mcw [Mon, 23 Sep 2024 14:06:38 +0000 (19:36 +0530)]
ggml : AVX512 gemm for Q4_0_8_8 (llama/9532)

* AVX512 version of ggml_gemm_q4_0_8x8_q8_0

* Remove zero vector parameter passing

* Rename functions and rearrange order of macros

* Edit commments

* style : minor adjustments

* Update x to start from 0

---------

Co-authored-by: Georgi Gerganov <redacted>
9 months agometal : use F32 prec for K*Q in vec FA (llama/9595)
Georgi Gerganov [Mon, 23 Sep 2024 08:27:47 +0000 (11:27 +0300)]
metal : use F32 prec for K*Q in vec FA (llama/9595)

ggml-ci

9 months agoRevert "[SYCL] fallback mmvq (#9088)" (llama/9579)
Akarshan Biswas [Mon, 23 Sep 2024 03:28:06 +0000 (08:58 +0530)]
Revert "[SYCL] fallback mmvq (#9088)" (llama/9579)

This reverts commit 50addec9a532a6518146ab837a85504850627316.

9 months agomusa: enable building fat binaries, enable unified memory, and disable Flash Attentio...
R0CKSTAR [Sun, 22 Sep 2024 14:55:49 +0000 (22:55 +0800)]
musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526)

* mtgpu: add mp_21 support

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: enable unified memory

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest)

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
9 months agoFix merge error in #9454 (llama/9589)
Molly Sophia [Sun, 22 Sep 2024 13:26:50 +0000 (21:26 +0800)]
Fix merge error in #9454 (llama/9589)

Signed-off-by: Molly Sophia <redacted>
9 months agoCUDA: enable Gemma FA for HIP/Pascal (llama/9581)
Johannes Gäßler [Sun, 22 Sep 2024 07:34:52 +0000 (09:34 +0200)]
CUDA: enable Gemma FA for HIP/Pascal (llama/9581)

9 months agoRWKV v6: RWKV_WKV op CUDA implementation (llama/9454)
Molly Sophia [Sun, 22 Sep 2024 02:29:12 +0000 (10:29 +0800)]
RWKV v6: RWKV_WKV op CUDA implementation (llama/9454)

* ggml: CUDA unary op EXP

Signed-off-by: Molly Sophia <redacted>
* ggml: rwkv_wkv op CUDA impl

Signed-off-by: Molly Sophia <redacted>
---------

Signed-off-by: Molly Sophia <redacted>
9 months agoggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (llama/9573)
slaren [Sat, 21 Sep 2024 12:24:23 +0000 (14:24 +0200)]
ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (llama/9573)

9 months agoUpdate CUDA graph on scale change plus clear nodes/params (llama/9550)
agray3 [Sat, 21 Sep 2024 00:41:07 +0000 (01:41 +0100)]
Update CUDA graph on scale change plus clear nodes/params (llama/9550)

* Avoid using saved CUDA graph if scale changes and reset nodes/params on update

Fixes https://github.com/ggerganov/llama.cpp/issues/9451

* clear before resize

9 months agoexamples : adapt to ggml.h changes (#0)
Georgi Gerganov [Fri, 20 Sep 2024 18:50:16 +0000 (21:50 +0300)]
examples : adapt to ggml.h changes (#0)

ggml-ci

9 months agosync : llama.cpp
Georgi Gerganov [Fri, 20 Sep 2024 18:22:05 +0000 (21:22 +0300)]
sync : llama.cpp

9 months agoggml : refactoring (llama/#0)
Georgi Gerganov [Fri, 20 Sep 2024 18:24:06 +0000 (21:24 +0300)]
ggml : refactoring (llama/#0)

d6a04f87
23e0d70b

9 months agoggml : fix builds (llama/0)
Georgi Gerganov [Fri, 20 Sep 2024 17:12:52 +0000 (20:12 +0300)]
ggml : fix builds (llama/0)

ggml-ci

9 months agoggml : fix trailing whitespace (llama/0)
Georgi Gerganov [Fri, 20 Sep 2024 16:13:02 +0000 (19:13 +0300)]
ggml : fix trailing whitespace (llama/0)

ggml-ci

9 months agoCUDA: fix sum.cu compilation for CUDA < 11.7 (llama/9562)
Johannes Gäßler [Fri, 20 Sep 2024 16:35:35 +0000 (18:35 +0200)]
CUDA: fix sum.cu compilation for CUDA < 11.7 (llama/9562)

9 months agoggml : fix n_threads_cur initialization with one thread (llama/9538)
slaren [Wed, 18 Sep 2024 17:13:08 +0000 (19:13 +0200)]
ggml : fix n_threads_cur initialization with one thread (llama/9538)

* ggml : fix n_threads_cur initialization with one thread

* Update ggml/src/ggml.c

---------

Co-authored-by: Max Krasnyansky <redacted>
9 months agothreadpool : skip polling for unused threads (llama/9461)
Max Krasnyansky [Tue, 17 Sep 2024 08:19:46 +0000 (01:19 -0700)]
threadpool : skip polling for unused threads (llama/9461)

* threadpool: skip polling for unused threads

Currently all threads do N polling rounds even if only 1 thread is active (n_threads_cur == 1).
This commit adds a check to skip the polling for unused threads (ith >= n_threads_cur).

n_threads_cur is now an atomic_int to explicitly tell thread sanitizer that it is written
from one thread and read from other threads (not a race conditions).

* threadpool: further simplify and improve ggml_barrier

Avoid using strict memory order while polling, yet make sure that all threads go through
full memory barrier (memory fence) on ggml_barrier entrace and exit.

* threads: add simple barrier test

This test does lots of small, parallel matmul ops where the barriers in between dominate the overhead.

* threadpool: improve thread sync for new-graphs

Using the same tricks as ggml_barrier. All the polling is done with relaxed memory order
to keep it efficient, once the new graph is detected we do full fence using read-modify-write
with strict memory order.

* threadpool: improve abort handling

Do not use threadpool->ec (exit code) to decide whether to exit the compute loop.
threadpool->ec is not atomic which makes thread-sanitizer rightfully unhappy about it.

Instead introduce atomic threadpool->abort flag used for this. This is consistent with
how we handle threadpool->stop or pause.

While at it add an explicit atomic_load for n_threads_cur for consistency.

* test-barrier: release threadpool before releasing the context

fixes use-after-free detected by gcc thread-sanitizer on x86-64
for some reason llvm sanitizer is not detecting this issue.

9 months agoggml : link MATH_LIBRARY not by its full path (llama/9339)
Michael Podvitskiy [Mon, 16 Sep 2024 11:06:50 +0000 (13:06 +0200)]
ggml : link MATH_LIBRARY not by its full path (llama/9339)

9 months agocmake : do not hide GGML options + rename option (llama/9465)
Georgi Gerganov [Mon, 16 Sep 2024 07:27:50 +0000 (10:27 +0300)]
cmake : do not hide GGML options + rename option (llama/9465)

* cmake : do not hide GGML options

ggml-ci

* build : rename flag GGML_CUDA_USE_GRAPHS -> GGML_CUDA_GRAPHS

for consistency

ggml-ci

9 months agoggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422)
Eve [Mon, 16 Sep 2024 06:48:24 +0000 (06:48 +0000)]
ggml : IQ4_NL sgemm + Q4_0 AVX optimization (llama/9422)

* squashed

readd my iq4_nl sgemm PR https://github.com/ggerganov/llama.cpp/pull/8049

have ggml_vec_dot_q4_0 do two blocks per loop for avx

try out f16c ggml_vec_dot_iq4_nl, but it's not really faster. as per https://github.com/ggerganov/llama.cpp/pull/8549 we can calculate several blocks at a time with no issue

* shuffle

* remove f16c iq4_nl as i cant make it faster than before

9 months agometal : handle zero-sized allocs (llama/9466)
Georgi Gerganov [Mon, 16 Sep 2024 06:05:56 +0000 (09:05 +0300)]
metal : handle zero-sized allocs (llama/9466)

9 months agocommon : reimplement logging (llama/9418)
Georgi Gerganov [Sun, 15 Sep 2024 17:46:12 +0000 (20:46 +0300)]
common : reimplement logging (llama/9418)

https://github.com/ggerganov/llama.cpp/pull/9418

9 months agocmake : correct order of sycl flags (llama/9497)
Michael Podvitskiy [Sun, 15 Sep 2024 16:55:52 +0000 (18:55 +0200)]
cmake : correct order of sycl flags (llama/9497)

9 months agocmake : try to fix sycl+intel build (llama/9487)
Michael Podvitskiy [Sun, 15 Sep 2024 07:06:38 +0000 (09:06 +0200)]
cmake : try to fix sycl+intel build (llama/9487)

9 months agoggml : ggml_type_name return "NONE" for invalid values (llama/9458)
Yuri Khrustalev [Sat, 14 Sep 2024 09:54:37 +0000 (05:54 -0400)]
ggml : ggml_type_name return "NONE" for invalid values (llama/9458)

When running on Windows, the quantization utility attempts to print the types that are not set which leads to a crash.

9 months agocmake : use list(APPEND ...) instead of set() + dedup linker (llama/9463)
Georgi Gerganov [Sat, 14 Sep 2024 07:55:05 +0000 (10:55 +0300)]
cmake : use list(APPEND ...) instead of set() + dedup linker (llama/9463)

* cmake : use list(APPEND ...) instead of set() + dedup linker

ggml-ci

* cmake : try fix sycl

* cmake : try to fix sycl 2

* cmake : fix sycl build (llama/9469)

* try fix sycl build

* use CMAKE_CXX_FLAGS as a string variable

---------

Co-authored-by: Georgi Gerganov <redacted>
* one more CMAKE_CXX_FLAGS fix (llama/9471)

---------

Co-authored-by: Michael Podvitskiy <redacted>
9 months agocann: Add host buffer type for Ascend NPU (llama/9406)
Dou Xinpeng [Thu, 12 Sep 2024 11:46:43 +0000 (19:46 +0800)]
cann: Add host buffer type for Ascend NPU (llama/9406)

* feat: Add host buffer type for Ascend NPU(CANN backend)

* fix some checking errors

* Add a few comments

9 months agoriscv : modify Makefile and add a RISCV_VECT to print log info (llama/9442)
Ahmad Tameem [Thu, 12 Sep 2024 11:24:31 +0000 (16:24 +0500)]
riscv : modify Makefile and add a RISCV_VECT to print log info (llama/9442)

- Added ggml_cpu_has_riscv_v() in GGML to print system info in log
- Modified Makefile to only use flag when cross compiling for RISC-V

9 months agocann: Fix error when running a non-exist op (llama/9424)
Xinpeng Dou [Thu, 12 Sep 2024 01:02:35 +0000 (09:02 +0800)]
cann: Fix error when running a non-exist op (llama/9424)

9 months agoCUDA: fix --split-mode row race condition (llama/9413)
Johannes Gäßler [Wed, 11 Sep 2024 08:22:40 +0000 (10:22 +0200)]
CUDA: fix --split-mode row race condition (llama/9413)

9 months agomusa: remove Clang builtins mapping (llama/9421)
R0CKSTAR [Wed, 11 Sep 2024 01:46:55 +0000 (09:46 +0800)]
musa: remove Clang builtins mapping (llama/9421)

Signed-off-by: Xiaodong Ye <redacted>
9 months agosycl : update support conditions (llama/9394)
Alberto Cabrera Pérez [Wed, 11 Sep 2024 00:53:42 +0000 (01:53 +0100)]
sycl : update support conditions (llama/9394)

* sycl : update support condition to im2col

Signed-off-by: Alberto Cabrera <redacted>
* Added TODO to remind supporting FP32 im2col

---------

Signed-off-by: Alberto Cabrera <redacted>
9 months agometal : fix compile warning with GGML_METAL_NDEBUG (llama/0)
Georgi Gerganov [Tue, 10 Sep 2024 07:17:03 +0000 (10:17 +0300)]
metal : fix compile warning with GGML_METAL_NDEBUG (llama/0)

9 months agorpc : fix segfault with nkvo (llama/9389)
Radoslav Gerganov [Mon, 9 Sep 2024 15:40:10 +0000 (18:40 +0300)]
rpc : fix segfault with nkvo (llama/9389)

* rpc : fix nkvo

* rpc : buf_size must not be static

ref: #9337

---------

Co-authored-by: slaren <redacted>
9 months agoggml : vector length agnostic SVE support (llama/9290)
Prashant Vithule [Mon, 9 Sep 2024 15:37:18 +0000 (21:07 +0530)]
ggml : vector length agnostic SVE support (llama/9290)

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths

* Removed WhiteSpaces

* ggml : style changes + fix 512-bit nb loop check

- fix local scope in switch cases
- consistent predicate names
- empty lines when necessary
- opening braces, spaces
- const-correctness
- add asserts

* Update ggml/src/ggml-quants.c

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
9 months agoCUDA: fix variable name conflict for Windows build (llama/9382)
Johannes Gäßler [Mon, 9 Sep 2024 12:22:53 +0000 (14:22 +0200)]
CUDA: fix variable name conflict for Windows build (llama/9382)

9 months agoOverlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting...
Markus Tavenrath [Sun, 8 Sep 2024 19:43:48 +0000 (21:43 +0200)]
Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (llama/9118)

* Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early.

* fix compile issues

* Fix issues where the last submit wasn't executed or handled properly.

* remove trailing whitespace

* Repair GGML_VULKAN_CHECK_RESULTS

* Increase submit counter only if actual work has been submitted and increase submit count to 100.

* Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.

9 months agocuda : fix FA Q src index (1 -> 0) (llama/9374)
Georgi Gerganov [Sun, 8 Sep 2024 19:01:02 +0000 (22:01 +0300)]
cuda : fix FA Q src index (1 -> 0) (llama/9374)

9 months agoadd check malloc result on device (llama/9346)
Neo Zhang Jianyu [Sun, 8 Sep 2024 11:05:29 +0000 (19:05 +0800)]
add check malloc result on device (llama/9346)

* add check malloc result on device

* update for review comments, check all malloc_device() result

---------

Co-authored-by: arthw <redacted>
9 months agoscripts : add context to sync-llama-am.sh
Georgi Gerganov [Fri, 20 Sep 2024 18:19:55 +0000 (21:19 +0300)]
scripts : add context to sync-llama-am.sh

9 months agoggml/examples: add backend support for numerical optimization (#949)
Johannes Gäßler [Fri, 20 Sep 2024 12:36:38 +0000 (14:36 +0200)]
ggml/examples: add backend support for numerical optimization (#949)

* CUDA eval works

* stochastic gradient descent op

* Adam except decay

* CUDA CROSS_ENTROPY_LOSS_BACK

* CUDA mnist-fc training works

* backend CLI arg

* refactor gguf load

* remove sched from opt_step_adam

* implement l1 regularization (weight decay)

* extra call to add optimizer

* initialize gradients with ggml_graph_reset

* gradient accumulation

* increment iter per eval instead of epoch

* adjust backend interfaces

* fix ggml_graph_reset without backend

* fix ggml graph export/import

* fixup

* rename

* revert ggml_opt changes

* more general CUDA repeat_back

* update documentation, fix CNN

* validation split

* add clarifying comment

* optimize PyTorch training

* adjust buffer size, thread count

* fix 0.0f validation split

* Update examples/mnist/mnist-common.cpp

Co-authored-by: Georgi Gerganov <redacted>
* fix gradient accumulation

* tensor flag for accumulators -> tensor hash set

* Update include/ggml.h

Co-authored-by: slaren <redacted>
* Update tests/test-backend-ops.cpp

Co-authored-by: slaren <redacted>
* Update tests/test-backend-ops.cpp

Co-authored-by: slaren <redacted>
* fix test prints

* Update src/ggml-backend.c

Co-authored-by: Georgi Gerganov <redacted>
* better CUDA support for noncontiguous out_prod

* add comment

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: slaren <redacted>
9 months agobuild: fix msvc build (#960)
Jeff Bolz [Wed, 18 Sep 2024 12:26:44 +0000 (07:26 -0500)]
build: fix msvc build (#960)

9 months agoexamples : add null threadpool args where needed (#0)
Georgi Gerganov [Sun, 8 Sep 2024 08:10:43 +0000 (11:10 +0300)]
examples : add null threadpool args where needed (#0)

ggml-ci

9 months agosync : llama.cpp
Georgi Gerganov [Sun, 8 Sep 2024 08:09:57 +0000 (11:09 +0300)]
sync : llama.cpp

9 months agometal : update support condition for im2col + fix warning (llama/0)
Georgi Gerganov [Sun, 8 Sep 2024 06:57:57 +0000 (09:57 +0300)]
metal : update support condition for im2col + fix warning (llama/0)

9 months agoggml : always check bounds on get_rows operations (llama/9354)
slaren [Sat, 7 Sep 2024 18:23:07 +0000 (20:23 +0200)]
ggml : always check bounds on get_rows operations (llama/9354)

9 months agoggml : fix missing `cpu_set_t` on emscripten (llama/9336)
Xuan Son Nguyen [Sat, 7 Sep 2024 10:01:34 +0000 (12:01 +0200)]
ggml : fix missing `cpu_set_t` on emscripten (llama/9336)

* ggml : fix missing cpu_set_t on emscripten

* better version

* bring back android part

9 months agoImprove Vulkan shader build system (llama/9239)
Markus Tavenrath [Fri, 6 Sep 2024 06:56:17 +0000 (08:56 +0200)]
Improve Vulkan shader build system (llama/9239)

* Improve Vulkan shader builds system

- Add dependency to vulkan-shaders-gen to rebuild shaders when changing the shader compilation utility.
- Add option to generate debug info for Vulkan shaders to provide shader source to Vulkan shader profiling tools

* remove not required self dependency