]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log
pkg/ggml/sources/whisper.cpp
9 months agoThreadpool: take 2 (llama/8672)
Faisal Zaghloul [Thu, 29 Aug 2024 23:20:53 +0000 (19:20 -0400)]
Threadpool: take 2 (llama/8672)

* Introduce ggml_compute_threadpool

- OpenMP functional: check
- Vanilla ggml functional: Check
- ggml w/threadpool functional: Check
- OpenMP no regression: No glaring problems
- Vanilla ggml no regression: No glaring problems
- ggml w/threadpool no regression: No glaring problems

* Minor fixes

* fixed use after release bug

* fixed a harmless race condition

* Fix Android bulid issue

* fix more race conditions

* fix deadlock for cases where cgraph.n_nodes == 1

and fix --poll case

* threadpool: use cpu_get_num_math to set the default number of threadpool threads

This way we avoid using E-Cores and Hyperthreaded siblings.

* bench: create fresh threadpool for each test

For benchmarking it's better to start a fresh pool for each test with the exact number of threads
needed for that test. Having larger pools is suboptimal (causes more load, etc).

* atomics: always use stdatomics with clang and use relaxed memory order when polling in ggml_barrier

This also removes sched_yield() calls from ggml_barrier() to match OpenMP behavior.

* threadpool: make polling the default to match openmp behavior

All command line args now allow for setting poll to 0 (false).

* threadpool: do not wakeup threads in already paused threadpool

* fix potential race condition in check_for_work

* threadpool: do not create two threadpools if their params are identical

* threadpool: reduce pause/resume/wakeup overhead in common cases

We now start threadpool in paused state only if we have two.
The resume is now implicit (ie new work) which allows for reduced locking and context-switch overhead.

* threadpool: add support for hybrid polling

poll params (--poll, ...) now specify "polling level", i.e. how aggresively we poll before waiting on cond.var.
poll=0 means no polling, 1 means poll for 128K rounds then wait, 2 for 256K rounds, ...

The default value of 50 (ie 50x128K rounds) seems like a decent default across modern platforms.
We can tune this further as things evolve.

* threadpool: reduce the number of barrier required

New work is now indicated with an atomic counter that is incremented for
each new graph that needs to be computed.
This removes the need for extra barrier for clearing the "new_work" and
removes the special case for trivial graphs.

* threadpool: remove special-casing for disposable threadpools

With the efficient hybrid polling there is no need to make disposable pools any different.
This simplifies the overall logic and reduces branching.

Include n_threads in debug print for disposable threadpool.

Declare pause and stop flags as atomic_bool
This doesn't actually generate any memory barriers and simply informs
the thread sanitizer that these flags can be written & read by different
threads without locking.

* threadpool: do not clear barrier counters between graphs computes (fixes race with small graphs)

This fixes the race condition with very small graphs where the main thread happens to
start a new graph while the workers are just about to exit from barriers.

* threadpool: use relaxed order for chunk sync

Full memory barrier is an overkill for this since each thread works on different chunk

* threadpool: remove abort_callback from threadpool state

* threadpool: better naming for thread/cpumask releated functions

* threadpool: consistent use of int type for n_threads params

* threadpool: add support for ggml_threadpool_params_default/init

Also removes the need for explicit mask_specified param.
all-zero cpumask means use default (usually inherited) cpu affinity mask.

* threadpool: move typedef into ggml.h

* threadpool: fix apply_priority() function name

* threadpool: fix swift wrapper errors due to n_threads int type cleanup

* threadpool: enable --cpu-mask and other threadpool related options only if threadpool is enabled

* threadpool: replace checks for compute_thread ret code with proper status check

* threadpool: simplify threadpool init logic and fix main thread affinity application

Most of the init code is now exactly the same between threadpool and openmp.

* threadpool: update threadpool resume/pause function names

* threadpool: enable openmp by default for now

* threadpool: don't forget to free workers state when omp is enabled

* threadpool: avoid updating process priority on the platforms that do not require it

On Windows we need to change overall process priority class in order to set thread priorities,
but on Linux, Mac, etc we do not need to touch the overall process settings.

* threadpool: update calling thread prio and affinity only at start/resume

This avoids extra syscalls for each graph_compute()

* llama-bench: turn threadpool params into vectors, add output headers, etc

* llama-bench: add support for cool off between tests --delay

This helps for long running tests on platforms that are thermally limited (phones, laptops, etc).
--delay (disabled by default) introduces the sleep for N seconds before starting each test.

* threadpool: move process priority setting into the apps (bench and cli)

This avoids changing the overall process priority on Windows for the apps
that use ggml/llama.cpp directy.

* threadpool: move all pause/resume logic into ggml

* threadpool: futher api cleanup and prep for future refactoring

All threadpool related functions and structs use ggml_threadpool prefix.

* threadpool: minor indent fixes

* threadpool: improve setprioty error message

* Update examples/llama-bench/llama-bench.cpp

Co-authored-by: slaren <redacted>
* threadpool: fix indent in set_threadpool call

* use int32_t for n_thread type in public llama.cpp API

* threadpool: use _new and _free instead of _create and _release

* fix two more public APIs to use int32_t for n_threads

* build: set _GNU_SOURCE for Adroid

---------

Co-authored-by: Max Krasnyansky <redacted>
Co-authored-by: fmz <redacted>
Co-authored-by: Max Krasnyansky <redacted>
Co-authored-by: slaren <redacted>
9 months agovulkan: fix compilation with GGML_VULKAN_DEBUG=ON (ggml/948)
Salvatore Mesoraca [Fri, 6 Sep 2024 12:34:33 +0000 (14:34 +0200)]
vulkan: fix compilation with GGML_VULKAN_DEBUG=ON (ggml/948)

the old code was trying to print a non-existent field (size)
and the struct as a whole (which doesn't have a operator<<
override defined).
Probably a typo happened during refactoring.

Signed-off-by: Salvatore Mesoraca <redacted>
9 months agovulkan: add dryrun support to sin and cos ops (ggml/947)
Salvatore Mesoraca [Fri, 6 Sep 2024 12:34:25 +0000 (14:34 +0200)]
vulkan: add dryrun support to sin and cos ops (ggml/947)

sin and cos failed test-backend-ops because they
tried to dereference a context pointer that is null
on dry runs.

This commit prevents that segfault.

Signed-off-by: Salvatore Mesoraca <redacted>
9 months agovulkan: correctly report support for OP_CONT (ggml/946)
Salvatore Mesoraca [Fri, 6 Sep 2024 12:34:07 +0000 (14:34 +0200)]
vulkan: correctly report support for OP_CONT (ggml/946)

test-backend-ops fails because ggml_cont aborts
when invoked passing an unsupported type.

This commit makes ggml_cont tests pass

Signed-off-by: Salvatore Mesoraca <redacted>
9 months agotests: add gradient tests for all backends (ggml/932)
Johannes Gäßler [Tue, 3 Sep 2024 15:21:46 +0000 (17:21 +0200)]
tests: add gradient tests for all backends (ggml/932)

* tests: add gradient checking to test-backend-ops

* remove old comment

* reorder includes

* adjust SIN/COS parameters

* add documentation, use supports_op if possible

9 months agogo : add temperature options (#2417)
Binozo [Fri, 20 Sep 2024 12:45:36 +0000 (14:45 +0200)]
go : add temperature options (#2417)

* Fixed go cuda bindings building

* Added note to go bindings Readme to build using cuda support

* Added temperature bindings for Go

---------

Co-authored-by: Binozo <redacted>
9 months agodocker : add libsdl2-dev for container builds (#2424)
JohnnyB [Fri, 20 Sep 2024 12:36:43 +0000 (13:36 +0100)]
docker : add libsdl2-dev for container builds (#2424)

* Added libsdl2-dev for SDL samples

Building talk-llama seems to fail here as there is no sdl.h.

* Adding libsdl2-dev for sdl.h

* Adding libsdl2-dev for sdl.h

9 months agogo : add tests and update bindings (#2425)
Stavros Panakakis [Fri, 20 Sep 2024 12:36:12 +0000 (15:36 +0300)]
go : add tests and update bindings (#2425)

Update Go version to ^1.23, the actions/setup-go
to v5, actions/checkout to v4 and
github.com/stretchr/testify to v1.9.0.

Add test cases for the following model
struct methods:
 - New
 - Close
 - NewContext
 - IsMultilingual
 - Languages

Add test cases for the following context
struct methods:
 - SetLanguage
 - IsMultilingual
 - Language
 - Process

9 months agoserver : use OS-generated temp file name for converted files (#2419)
Toliver [Tue, 17 Sep 2024 12:56:32 +0000 (21:56 +0900)]
server : use OS-generated temp file name for converted files (#2419)

9 months agogo : fix CUDA build (#2416)
Binozo [Sun, 15 Sep 2024 09:23:56 +0000 (11:23 +0200)]
go : fix CUDA build (#2416)

* Fixed go cuda bindings building

* Added note to go bindings Readme to build using cuda support

---------

Co-authored-by: Binozo <redacted>
9 months agocann : add Ascend NPU instructions (#2410)
Mengqing Cao [Wed, 11 Sep 2024 12:59:24 +0000 (20:59 +0800)]
cann : add Ascend NPU instructions (#2410)

9 months agocmake: Fix libdir value in pkgconfig file (#2407)
Philippe Normand [Sat, 7 Sep 2024 08:18:17 +0000 (09:18 +0100)]
cmake: Fix libdir value in pkgconfig file (#2407)

Depending on the OS the lib dir can vary, on Fedora for instance it is
"${prefix}/lib64". Instead of hard-coding the directory name, let CMake fill
this variable for us.

9 months agorevert : cmake : set MSVC to use UTF-8 on source files (#2346)
Georgi Gerganov [Mon, 2 Sep 2024 08:16:30 +0000 (11:16 +0300)]
revert : cmake : set MSVC to use UTF-8 on source files (#2346)

This reverts commit c96906d84dd6a1c40ea797ad542df3a0c47307a3.

9 months agosync : ggml
Georgi Gerganov [Mon, 2 Sep 2024 07:39:59 +0000 (10:39 +0300)]
sync : ggml

9 months agoggml: fix ggml_graph_cpy undefined behavior (ggml/943)
Johannes Gäßler [Sat, 31 Aug 2024 12:35:42 +0000 (14:35 +0200)]
ggml: fix ggml_graph_cpy undefined behavior (ggml/943)

9 months agocann : fix doxy (ggml/0)
Georgi Gerganov [Wed, 28 Aug 2024 15:45:01 +0000 (18:45 +0300)]
cann : fix doxy (ggml/0)

9 months agovulkan : fix build (llama/0)
Georgi Gerganov [Tue, 27 Aug 2024 19:10:58 +0000 (22:10 +0300)]
vulkan : fix build (llama/0)

ggml-ci

9 months agocuda : mark BF16 CONT as unsupported
Georgi Gerganov [Wed, 28 Aug 2024 14:08:03 +0000 (17:08 +0300)]
cuda : mark BF16 CONT as unsupported

9 months agoggml : fix cont with transposed tensors when one dimension is 1 (ggml/934)
Salvatore Mesoraca [Wed, 28 Aug 2024 08:23:02 +0000 (10:23 +0200)]
ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934)

* ggml_cont: fix issue with transposed tensors when one dimension is 1

when using multiple threads, it is not enough
to check for the tensors to be contiguous for
ggml_compute_forward_dup_same_cont to work correctly.
The tensors strides also need to match.

Signed-off-by: Salvatore Mesoraca <redacted>
* Add ggml_cont tests

Signed-off-by: Salvatore Mesoraca <redacted>
* Remove dead code

it isn't possible to reach this code because
all these functions are invoked by ggml_compute_forward_dup
if and only if src0->type != dst->type

Signed-off-by: Salvatore Mesoraca <redacted>
* Make ggml_compute_forward_dup_same_cont work with contiguous tensors

Co-authored-by: Georgi Gerganov <redacted>
Signed-off-by: Salvatore Mesoraca <redacted>
---------

Signed-off-by: Salvatore Mesoraca <redacted>
Co-authored-by: Georgi Gerganov <redacted>
9 months agocmake : set MSVC to use UTF-8 on source files (#2346)
Tim Miller [Fri, 30 Aug 2024 11:04:04 +0000 (20:04 +0900)]
cmake : set MSVC to use UTF-8 on source files (#2346)

9 months agoreadme : remove invalid flag from Python example (#2396)
UsernamesLame [Fri, 30 Aug 2024 11:00:38 +0000 (07:00 -0400)]
readme : remove invalid flag from Python example (#2396)

* Update README.md

Fix broken C-style API link

* Update whisper_processor.py

Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call.

* Add pywhispercpp to the Pybind11 Python wrapper list

abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?)

It was referenced in issue #9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp

9 months agoreadme : fix link (#2394)
Georgi Gerganov [Fri, 30 Aug 2024 10:58:22 +0000 (13:58 +0300)]
readme : fix link (#2394)

10 months agogo : add beamsize/entropythold/maxcontext to context interface (#2350)
hsinhoyeh [Wed, 28 Aug 2024 14:09:01 +0000 (22:09 +0800)]
go : add beamsize/entropythold/maxcontext to context interface (#2350)

* feat(go binding): add beamsize/entropythold/maxcontext to context interface

fixes: #2349

* fix go building build

* fix dynamic link .so and header.h

* remove LD_LIBRARY_PATH

* remove ggml obj from whisper dynamic lib

* drop LIB_GGML

10 months agotalk-llama : sync llama.cpp
Georgi Gerganov [Wed, 28 Aug 2024 08:04:02 +0000 (11:04 +0300)]
talk-llama : sync llama.cpp

10 months agowhisper : update FA call
Georgi Gerganov [Wed, 28 Aug 2024 08:02:54 +0000 (11:02 +0300)]
whisper : update FA call

10 months agosync : ggml
Georgi Gerganov [Wed, 28 Aug 2024 08:02:42 +0000 (11:02 +0300)]
sync : ggml

10 months agosync : vulkan (skip) (llama/0)
Georgi Gerganov [Tue, 27 Aug 2024 18:48:22 +0000 (21:48 +0300)]
sync : vulkan (skip) (llama/0)

10 months agoggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192)
slaren [Mon, 26 Aug 2024 17:44:43 +0000 (19:44 +0200)]
ggml : do not crash when quantizing q4_x_x with an imatrix (llama/9192)

10 months agometal : separate scale and mask from QKT in FA kernel (llama/9189)
Georgi Gerganov [Mon, 26 Aug 2024 15:31:02 +0000 (18:31 +0300)]
metal : separate scale and mask from QKT in FA kernel (llama/9189)

* metal : separate scale and mask from QKT in FA kernel

* metal : ne01 check no longer necessary

* metal : keep data in local memory

10 months agoggml : add SSM Metal kernels (llama/8546)
Georgi Gerganov [Mon, 26 Aug 2024 14:55:36 +0000 (17:55 +0300)]
ggml : add SSM Metal kernels (llama/8546)

* ggml : add ggml_ssm_conv metal impl

* ggml : add ssm_scan metal impl

ggml-ci

10 months agometal : gemma2 flash attention support (llama/9159)
slaren [Mon, 26 Aug 2024 09:08:59 +0000 (11:08 +0200)]
metal : gemma2 flash attention support (llama/9159)

10 months agoCPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
Johannes Gäßler [Sat, 24 Aug 2024 19:34:59 +0000 (21:34 +0200)]
CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)

* CPU/CUDA: Gemma 2 FlashAttention support

* apply logit_softcap to scale in kernel

* disable logit softcapping tests on Metal

* remove metal check

10 months agoAdd a space to supress a cmake warning (llama/9133)
Akarshan Biswas [Thu, 22 Aug 2024 14:09:47 +0000 (19:39 +0530)]
Add a space to supress a cmake warning (llama/9133)

10 months agoAdd oneDNN primitive support (llama/9091)
luoyu-intel [Thu, 22 Aug 2024 04:50:10 +0000 (12:50 +0800)]
Add oneDNN primitive support (llama/9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

10 months agollama : simplify Mamba with advanced batch splits (llama/8526)
compilade [Wed, 21 Aug 2024 21:58:11 +0000 (17:58 -0400)]
llama : simplify Mamba with advanced batch splits (llama/8526)

* llama : advanced batch splits

This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.

* llama : always make recurrent state slots contiguous

* ggml : simplify mamba operators

* llama : fix integer signedness mixing

* llama : logits_all has priority over batch->logits

Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.

* llama : apply suggestions

Co-authored-by: Georgi Gerganov <redacted>
* llama : fix t5 segfault

* llama : fix Mamba session save and restore

* llama : minor cosmetic changes

* llama : rename llama_reorder_outputs to llama_output_reorder

Also move it closer to llama_output_reserve.

* llama : fix pooled embeddings when using batches with equal_seqs

* minor : add struct members for clarity

ggml-ci

* llama : fix T5 segfault again

* llama : fix Mamba pooled embeddings with multiple sequences

Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.

* llama : add llama_model_is_recurrent to simplify figuring that out

This will make it easier to more cleanly support RWKV-v6 and Mamba-2.

* llama : fix simple splits when the batch contains embeddings

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agofallback mmvq (llama/9088)
Meng, Hengyu [Tue, 20 Aug 2024 15:50:17 +0000 (23:50 +0800)]
fallback mmvq (llama/9088)

* fallback mmvq to mul_mat

* mmvq in cuda path

* Update ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <redacted>
---------

Co-authored-by: Alberto Cabrera Pérez <redacted>
10 months agoFix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052)
zhentaoyu [Tue, 20 Aug 2024 15:06:51 +0000 (23:06 +0800)]
Fix SYCL `im2col` and `convert` Overflow with Large Dims (llama/9052)

* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert overflow

Signed-off-by: zhentaoyu <redacted>
* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <redacted>
* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <redacted>
* sycl:refine convert

Signed-off-by: zhentaoyu <redacted>
* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <redacted>
* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <redacted>
* test: make new cases only in sycl

Signed-off-by: zhentaoyu <redacted>
* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <redacted>
---------

Signed-off-by: zhentaoyu <redacted>
10 months agorpc : print error message when failed to connect endpoint (llama/9042)
Radoslav Gerganov [Mon, 19 Aug 2024 07:11:45 +0000 (10:11 +0300)]
rpc : print error message when failed to connect endpoint (llama/9042)

10 months agorpc : prevent crashes on invalid input (llama/9040)
Radoslav Gerganov [Mon, 19 Aug 2024 07:10:21 +0000 (10:10 +0300)]
rpc : prevent crashes on invalid input (llama/9040)

Add more checks which prevent RPC server from crashing if invalid input
is received from client

10 months agoggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)
Nico Bosshard [Fri, 16 Aug 2024 02:22:55 +0000 (04:22 +0200)]
ggml : dynamic ggml_sched_max_splits based on graph_size (llama/9047)

* ggml : Dynamic ggml_sched_max_splits based on graph_size

* Fixed and readded debug code for causes

10 months agocmake : remove unused option GGML_CURL (llama/9011)
Georgi Gerganov [Wed, 14 Aug 2024 06:14:49 +0000 (09:14 +0300)]
cmake : remove unused option GGML_CURL (llama/9011)

10 months agoggml : move rope type enum to ggml.h (llama/8949)
Daniel Bevenius [Tue, 13 Aug 2024 19:13:15 +0000 (21:13 +0200)]
ggml : move rope type enum to ggml.h (llama/8949)

* ggml : move rope type enum to ggml.h

This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.

The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.

Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.

* squash! ggml : move rope type enum to ggml.h

This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.

I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.

* squash! ggml : move rope type enum to ggml.h

This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.

* squash! ggml : move rope type enum to ggml.h

This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.

* squash! ggml : move rope type enum to ggml.h

This commit fixes the editorconfig-checker warnings.

* squash! ggml : move rope type enum to ggml.h

Update comment for ggml_rope function.

* Revert "squash! ggml : move rope type enum to ggml.h"

This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6.

* squash! ggml : move rope type enum to ggml.h

Add GGML_ROPE_TYPE_NEOX to rope_common.comp.

* remove extra line

---------

Co-authored-by: slaren <redacted>
10 months agoggml: fix div-by-zero (llama/9003)
DavidKorczynski [Mon, 12 Aug 2024 12:21:41 +0000 (13:21 +0100)]
ggml: fix div-by-zero (llama/9003)

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724
In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5

Signed-off-by: David Korczynski <redacted>
10 months agoOptimize Vulkan backend for better CPU performance and less GPU synchronization overh...
Markus Tavenrath [Sun, 11 Aug 2024 08:09:09 +0000 (10:09 +0200)]
Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (llama/8943)

* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <redacted>
10 months agofeat: ref. cross entropy, add CUDA, fix grad test (ggml/929)
Johannes Gäßler [Tue, 27 Aug 2024 18:39:30 +0000 (20:39 +0200)]
feat: ref. cross entropy, add CUDA, fix grad test (ggml/929)

10 months agoggml: remove bad assert (ggml/928)
Johannes Gäßler [Sat, 24 Aug 2024 17:27:02 +0000 (19:27 +0200)]
ggml: remove bad assert (ggml/928)

10 months agoexamples: add MNIST training + missing ops
Johannes Gäßler [Tue, 30 Jul 2024 13:56:35 +0000 (15:56 +0200)]
examples: add MNIST training + missing ops

10 months agomodels : add support for wget2 for fedora (#2387)
Brad Murray [Wed, 28 Aug 2024 08:46:01 +0000 (04:46 -0400)]
models : add support for wget2 for fedora (#2387)

10 months agoreadme : update the path to bench.py (#2386)
Peng [Wed, 28 Aug 2024 08:45:05 +0000 (16:45 +0800)]
readme : update the path to bench.py (#2386)

10 months agoreadme : fix typo (#2383)
Ivo von Putzer Reibegg [Wed, 28 Aug 2024 08:42:18 +0000 (10:42 +0200)]
readme : fix typo (#2383)

10 months agoreadme : fix broken links in implementation details section (#2382)
stormofice [Wed, 28 Aug 2024 08:41:51 +0000 (10:41 +0200)]
readme : fix broken links in implementation details section (#2382)

10 months agowhisper : fix compile warning for unused params
Georgi Gerganov [Wed, 28 Aug 2024 08:40:11 +0000 (11:40 +0300)]
whisper : fix compile warning for unused params

10 months agosync : ggml vulkan (ggml/0)
Georgi Gerganov [Tue, 20 Aug 2024 08:27:12 +0000 (11:27 +0300)]
sync : ggml vulkan (ggml/0)

ggml-ci

10 months agoyolo : add backend support (ggml/924)
Radoslav Gerganov [Mon, 19 Aug 2024 07:09:33 +0000 (10:09 +0300)]
yolo : add backend support (ggml/924)

* yolo : add backend support

* metal : add sub and sqrt kernels

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agoggml : fix typo in ggml-quants.c comment (ggml/922)
Daniel Bevenius [Thu, 15 Aug 2024 07:42:38 +0000 (09:42 +0200)]
ggml : fix typo in ggml-quants.c comment (ggml/922)

10 months agofeat: add new `sin` and `cos` operators (ggml/919)
Ronsor [Mon, 12 Aug 2024 13:02:08 +0000 (06:02 -0700)]
feat: add new `sin` and `cos` operators (ggml/919)

* ggml : add sin/cos operators

* ggml-cuda : add sin/cos operators

* ggml : add corresponding tests for sin/cos

* ggml : add backward computation for sin/cos operators

* ggml-vulkan : add sin/cos operators

* ggml-vulkan : add sin/cos shader source

* metal : add sin, cos

---------

Co-authored-by: Georgi Gerganov <redacted>
10 months agoreadme : fix broken links (#2358)
Eric Curtin [Tue, 20 Aug 2024 07:57:45 +0000 (03:57 -0400)]
readme : fix broken links (#2358)

For whisper.cpp and whisper.h files

10 months agoexamples : use colorblind friendly TTY color scheme (#2360)
Justine Tunney [Tue, 20 Aug 2024 07:49:10 +0000 (00:49 -0700)]
examples : use colorblind friendly TTY color scheme (#2360)

This change updates the -pc flag, so that a new xterm256 color scheme is
used. This color scheme is believed to be better for three reasons:

1. It should be friendlier to the colorblind. The scheme was designed by
   Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it
   since 2017, so it's already popular in the machine learning community

2. It should appear to be the same colors as before to people who aren't
   i.e. it's still a red-green spectrum like before but lightly modified

3. It is readable in both white and black background terminals. The neon
   colors before were probably a bit too intense for white backgrounds.

10 months agosync : ggml
Georgi Gerganov [Mon, 12 Aug 2024 08:59:15 +0000 (11:59 +0300)]
sync : ggml

10 months agoggml : support forward pass broadcasting in ggml_sub (ggml/914)
Salvatore Mesoraca [Sun, 11 Aug 2024 08:08:53 +0000 (10:08 +0200)]
ggml : support forward pass broadcasting in ggml_sub (ggml/914)

* ggml: support forward pass broadcasting in ggml_sub

Signed-off-by: Salvatore Mesoraca <redacted>
* Use assert instead of GGML_ASSERT in ggml_compute_forward_sub_f32

The check is already performed in ggml_sub_impl

Signed-off-by: Salvatore Mesoraca <redacted>
---------

Signed-off-by: Salvatore Mesoraca <redacted>
10 months agometal : fix uninitialized abort_callback (llama/8968)
slaren [Sat, 10 Aug 2024 13:42:10 +0000 (15:42 +0200)]
metal : fix uninitialized abort_callback (llama/8968)

10 months agorpc : sanitize tensor data + warnings (llama/0)
Georgi Gerganov [Fri, 9 Aug 2024 20:03:21 +0000 (23:03 +0300)]
rpc : sanitize tensor data + warnings (llama/0)

Co-authored-by: slaren <redacted>
10 months agocann : add Ascend NPU support (#2336)
Mengqing Cao [Fri, 9 Aug 2024 12:21:56 +0000 (20:21 +0800)]
cann : add Ascend NPU support (#2336)

* enable Ascend NPU in src/whisper.cpp
  * sync test-backend-ops with llama.cpp

10 months agowhisper : fix compile warning (#0)
Georgi Gerganov [Thu, 8 Aug 2024 19:59:59 +0000 (22:59 +0300)]
whisper : fix compile warning (#0)

10 months agosync : ggml
Georgi Gerganov [Thu, 8 Aug 2024 19:59:19 +0000 (22:59 +0300)]
sync : ggml

10 months agoggml : add CANN backend (llama/0)
hipudding [Thu, 8 Aug 2024 11:48:06 +0000 (14:48 +0300)]
ggml : add CANN backend (llama/0)

ggml-ci

10 months agoscripts : sync cann
Georgi Gerganov [Thu, 8 Aug 2024 19:58:13 +0000 (22:58 +0300)]
scripts : sync cann

10 months agoci : disable ruby workflow (#0)
Georgi Gerganov [Thu, 8 Aug 2024 17:35:21 +0000 (20:35 +0300)]
ci : disable ruby workflow (#0)

10 months agoci : try to fix FreeBSD (#0)
Georgi Gerganov [Thu, 8 Aug 2024 17:32:19 +0000 (20:32 +0300)]
ci : try to fix FreeBSD (#0)

10 months agobuild : fix aarch64 (#0)
Georgi Gerganov [Thu, 8 Aug 2024 11:27:16 +0000 (14:27 +0300)]
build : fix aarch64 (#0)

10 months agotalk-llama : sync llama.cpp
Georgi Gerganov [Thu, 8 Aug 2024 11:16:50 +0000 (14:16 +0300)]
talk-llama : sync llama.cpp

10 months agosync : ggml
Georgi Gerganov [Thu, 8 Aug 2024 11:10:06 +0000 (14:10 +0300)]
sync : ggml

10 months agoggml-backend : fix async copy from CPU (llama/8897)
slaren [Wed, 7 Aug 2024 11:29:02 +0000 (13:29 +0200)]
ggml-backend : fix async copy from CPU (llama/8897)

* ggml-backend : fix async copy from CPU

* cuda : more reliable async copy, fix stream used when the devices are the same

10 months agoUpdated SYCL device filtering (llama/8901)
Ouadie EL FAROUKI [Wed, 7 Aug 2024 10:25:36 +0000 (11:25 +0100)]
Updated SYCL device filtering (llama/8901)

* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme

10 months agoCUDA/HIP: fix tests/test-backend-ops (llama/8896)
Johannes Gäßler [Wed, 7 Aug 2024 07:07:52 +0000 (09:07 +0200)]
CUDA/HIP: fix tests/test-backend-ops (llama/8896)

10 months agoCUDA: fix padding logic for FP16/FP32 (llama/8884)
Johannes Gäßler [Tue, 6 Aug 2024 15:13:55 +0000 (17:13 +0200)]
CUDA: fix padding logic for FP16/FP32 (llama/8884)

10 months agoggml : add epsilon as a parameter for group_norm (llama/8818)
Molly Sophia [Tue, 6 Aug 2024 07:26:46 +0000 (15:26 +0800)]
ggml : add epsilon as a parameter for group_norm (llama/8818)

Signed-off-by: Molly Sophia <redacted>
10 months agoggml : fix overflows in elu function (llama/8866)
Justine Tunney [Mon, 5 Aug 2024 12:43:40 +0000 (05:43 -0700)]
ggml : fix overflows in elu function (llama/8866)

It's helpful to use expm1f(x), because expf(x)-1 will result in overflow
for 25% of single-precision floating point numbers.

10 months agoggml : reading the runtime sve config of the cpu (llama/8709)
jdomke [Sat, 3 Aug 2024 16:34:41 +0000 (01:34 +0900)]
ggml : reading the runtime sve config of the cpu (llama/8709)

* ggml : reading the runtime sve config of the cpu

* change to one time init to prevent performance drop

* prefix variable to avoid possible conflicts

* revert xxhash fix and add brackets

---------

Co-authored-by: domke <redacted>
10 months agoFix conversion of unnormalized BF16->BF16 weights (llama/7843)
Sigbjørn Skjæret [Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)]
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>
10 months agoFixing wrong VDR iq4nl value (llama/8812)
Ouadie EL FAROUKI [Fri, 2 Aug 2024 00:55:17 +0000 (01:55 +0100)]
Fixing wrong VDR iq4nl value (llama/8812)

10 months agoggml-cuda: Adding support for unified memory (llama/8035)
matteo [Thu, 1 Aug 2024 21:28:28 +0000 (23:28 +0200)]
ggml-cuda: Adding support for unified memory (llama/8035)

* Adding support for unified memory

* adding again the documentation about unified memory

* refactoring: Moved the unified memory code in the correct location.

* Fixed compilation error when using hipblas

* cleaning up the documentation

* Updating the documentation

Co-authored-by: Johannes Gäßler <redacted>
* adding one more case where the PR should not be enabled

---------

Co-authored-by: matteo serva <redacted>
Co-authored-by: Johannes Gäßler <redacted>
10 months agoBuild: Only include execinfo.h on linux systems that support it (llama/8783)
Alex O'Connell [Thu, 1 Aug 2024 16:53:46 +0000 (12:53 -0400)]
Build: Only include execinfo.h on linux systems that support it (llama/8783)

* Only enable backtrace on GLIBC linux systems

* fix missing file from copy

* use glibc macro instead of defining a custom one

10 months agocuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)
slaren [Thu, 1 Aug 2024 13:26:22 +0000 (15:26 +0200)]
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (llama/8800)

* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X

* update asserts

* only use dmmv for supported types

* add test

10 months agoadded android implementation of ggml_print_backtrace_symbols (llama/8751)
l3utterfly [Tue, 30 Jul 2024 14:40:18 +0000 (23:40 +0900)]
added android implementation of ggml_print_backtrace_symbols (llama/8751)

* added android implementation of ggml_print_backtrace_symbols

* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
* Update ggml/src/ggml.c

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>
10 months agocann: update cmake (llama/8765)
wangshuai09 [Tue, 30 Jul 2024 10:37:35 +0000 (18:37 +0800)]
cann: update cmake (llama/8765)

10 months agoAdd `TIMESTEP_EMBEDDING` OP (llama/8707)
zhentaoyu [Tue, 30 Jul 2024 06:56:51 +0000 (14:56 +0800)]
Add `TIMESTEP_EMBEDDING` OP (llama/8707)

Signed-off-by: zhentaoyu <redacted>
10 months agoggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)
CarterLi999 [Mon, 29 Jul 2024 16:38:34 +0000 (00:38 +0800)]
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (llama/8748)

In these codes, we want to retain the value that they previously held
when mask[i] is false. So we should use undisturbed. With the default
agnostic policy of rvv intrinsic, these values can be held or be
written with 1s.

Co-authored-by: carter.li <redacted>
10 months agocuda : organize vendor-specific headers into vendors directory (llama/8746)
R0CKSTAR [Mon, 29 Jul 2024 12:56:12 +0000 (20:56 +0800)]
cuda : organize vendor-specific headers into vendors directory (llama/8746)

Signed-off-by: Xiaodong Ye <redacted>
10 months agoadd conv support (llama/8688)
Meng, Hengyu [Mon, 29 Jul 2024 02:50:27 +0000 (10:50 +0800)]
add conv support (llama/8688)

10 months agofeat: Support Moore Threads GPU (llama/8383)
R0CKSTAR [Sat, 27 Jul 2024 23:41:25 +0000 (07:41 +0800)]
feat: Support Moore Threads GPU (llama/8383)

* Update doc for MUSA

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <redacted>
* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <redacted>
* CUDA => MUSA

Signed-off-by: Xiaodong Ye <redacted>
* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <redacted>
* Fix CI build failure

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
10 months agoggml : ignore more msvc warnings (ggml/906)
Borislav Stanimirov [Wed, 7 Aug 2024 07:00:56 +0000 (10:00 +0300)]
ggml : ignore more msvc warnings (ggml/906)

10 months agometal : fix struct name (ggml/912)
Georgi Gerganov [Wed, 7 Aug 2024 06:57:00 +0000 (09:57 +0300)]
metal : fix struct name (ggml/912)

ggml-ci

10 months agometal : add abort callback (ggml/905)
Conrad Kramer [Wed, 7 Aug 2024 06:55:49 +0000 (02:55 -0400)]
metal : add abort callback (ggml/905)

10 months agovulkan : implement Stable Diffusion operators (ggml/904)
0cc4m [Sun, 4 Aug 2024 15:28:08 +0000 (17:28 +0200)]
vulkan : implement Stable Diffusion operators (ggml/904)

* Fix Vulkan repeat op

* Implement Vulkan concat op

* Delete old Vulkan shader generator

* Implement Vulkan im2col op

* Implement Vulkan unary gelu_quick op

* Implement Vulkan group_norm op

* Implement Vulkan timestep_embedding op

* Implement Vulkan upscale op

* Fix Vulkan vk_context tensor extra index issue

* Fix Vulkan matmul shader parameter bug

* Properly fix Vulkan matmul shader parameter bug

* Add Vulkan ADD f16 + f32 -> f16 operator support

* Implement Vulkan tanh op

* Fix Vulkan group count too large Validation error on non-Nvidia GPUs

* Throw error when too much memory is requested

* Fix another Vulkan group count too large Validation error on non-Nvidia GPUs

* Fix matmul MMQ condition

* Implement Vulkan pad op

* Fix Vulkan crash when tensor is used multiple times in a compute graph

* Add Vulkan CONCAT f16 + f16 -> f16 op

* Add Vulkan LEAKY_RELU op

10 months agoggml : move c parameter comment to ggml_rope_ext (ggml/901)
Daniel Bevenius [Mon, 29 Jul 2024 13:06:06 +0000 (15:06 +0200)]
ggml : move c parameter comment to ggml_rope_ext (ggml/901)

This commit moves the comment for the c parameter from ggml_rope to
ggml_rope_ext. The comment is currently incorrect as ggml_rope does not
have a c parameter (freq_factors tensor).

Signed-off-by: Daniel Bevenius <redacted>
10 months agoggml : resolve sync conflicst (ggml/0)
Georgi Gerganov [Sat, 27 Jul 2024 14:17:23 +0000 (17:17 +0300)]
ggml : resolve sync conflicst (ggml/0)

ggml-ci

10 months agocommon : handle new quant types (ggml/0)
Georgi Gerganov [Sat, 27 Jul 2024 14:17:04 +0000 (17:17 +0300)]
common : handle new quant types (ggml/0)

10 months agoggml : add ggml-aarch64 (ggml/0)
Dibakar Gope [Sat, 27 Jul 2024 14:16:40 +0000 (17:16 +0300)]
ggml : add ggml-aarch64 (ggml/0)

10 months agoggml : reduce hash table reset cost (llama/8698)
slaren [Sat, 27 Jul 2024 02:41:55 +0000 (04:41 +0200)]
ggml : reduce hash table reset cost (llama/8698)

* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string