git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/log

overview / pkg / ggml / sources / whisper.cpp / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 16:46:30 +0000 (18:46 +0200)]

imatrix : offload to GPU support (llama/4957)

* backend : add eval callback

ggml-ci

* backend : group nodes in a single compute when user don't need them

* backend : clean-up the implementation

ggml-ci

* simple : do not perform tensor data copy if not needed

* simple : fix

* imatrix : offload to GPU support

* imatrix : fix ggml_mul_mat_id hanlding

ggml-ci

* ci : add imatrix test

ggml-ci

* ci : rearrange output

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 16:39:41 +0000 (18:39 +0200)]

backend : add eval callback (llama/4935)

* backend : add eval callback

ggml-ci

* backend : group nodes in a single compute when user don't need them

* backend : clean-up the implementation

ggml-ci

* simple : do not perform tensor data copy if not needed

* simple : fix

* simple : no need for ggml_is_contiguous + fix bool parse

* llama : fix callback placement in llama_context_params

* backend : avoid double-ask callback calls

* simple : restore examples, imatrix will serve as a demo

commit | commitdiff | tree

Georgi Gerganov [Wed, 17 Jan 2024 16:38:39 +0000 (18:38 +0200)]

metal : create autorelease pool during library build (llama/4970)

* metal : create autorelease pool during library build

ggml-ci

* test : simplify

ggml-ci

commit | commitdiff | tree

Kawrakow [Tue, 16 Jan 2024 17:51:26 +0000 (19:51 +0200)]

ggml : importance matrix support for legacy quants (llama/4969)

* imatrix: adding support for legacy quants

* imatrix: guard Q4_0/Q5_0 against ffn_down craziness

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Alex Azarov [Tue, 16 Jan 2024 13:33:02 +0000 (14:33 +0100)]

metal : log `recommendedMaxWorkingSetSize` on iOS 16+ (llama/4936)

* metal: Log `recommendedMaxWorkingSetSize` on iOS 16+

* Only log on iOS and macOS, ignoring tvOS and other platforms

* Check for Xcode version before using recommendedMaxWorkingSetSize

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Justine Tunney [Tue, 16 Jan 2024 11:16:33 +0000 (03:16 -0800)]

ggml : introduce GGML_CALL function annotation (llama/4850)

This change makes it possible to build ggml-cuda.cu and ggml-metal.m as
independent dynamic shared objects, that may be conditionally linked at
runtime in a multiplatform binary. It introduces a GGML_CALL annotation
that documents which functions have a cyclic call relationship, between
the application code and GPU modules.

This change does nothing, unless the build defines -DGGML_MULTIPLATFORM
which causes back-references and function pointers to conform to MS ABI
which is supported by NVCC, ROCm, XCode, GCC and Clang across platforms

commit | commitdiff | tree

Georgi Gerganov [Mon, 15 Jan 2024 11:27:00 +0000 (13:27 +0200)]

cuda : fix dequantize kernel names (llama/4938)

commit | commitdiff | tree

Kawrakow [Mon, 15 Jan 2024 05:48:06 +0000 (07:48 +0200)]

CUDA: faster dequantize kernels for Q4_0 and Q4_1 (llama/4938)

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Kawrakow [Sun, 14 Jan 2024 14:21:12 +0000 (16:21 +0200)]

Add ability to use importance matrix for all k-quants (llama/4930)

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Benjamin Heiniger [Tue, 16 Jan 2024 13:52:01 +0000 (14:52 +0100)]

talk-llama : optional wake-up command and audio confirmation (#1765)

* talk-llama: add optional wake-word detection from command

* talk-llama: add optional audio confirmation before generating answer

* talk-llama: fix small formatting issue in output

* talk-llama.cpp: fix Windows build

commit | commitdiff | tree

Przemysław Pawełczyk [Mon, 15 Jan 2024 13:48:13 +0000 (14:48 +0100)]

server : fix building and simplify lib deps on Windows (#1772)

* make : fix server example building on MSYS2 environments (Windows)

It was not working since commit eff3570f78742dfd56024328ed93d4f442434280
when server was introduced.

* cmake : simplify server example lib deps on Windows

server uses httplib::Server, not httplib::SSLServer, so there is no need
to mention cryptographic libraries in target_link_libraries.
Winsock (ws2_32) suffices here.

Also use plain library names like we use in other places.

commit | commitdiff | tree

Georgi Gerganov [Sun, 14 Jan 2024 16:08:20 +0000 (18:08 +0200)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Sun, 14 Jan 2024 09:06:28 +0000 (11:06 +0200)]

talk-llama : llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Sun, 14 Jan 2024 08:55:18 +0000 (10:55 +0200)]

sync : ggml

commit | commitdiff | tree

Alex Azarov [Sun, 14 Jan 2024 08:44:39 +0000 (09:44 +0100)]

metal : correctly set SIMD support flags on iOS (llama/4923)

* Correctly set support_simdgroup_reduction and support_simdgroup_mm on iPhone/iPad

* log a little bit more info on iOS

commit | commitdiff | tree

Kawrakow [Sun, 14 Jan 2024 07:45:56 +0000 (09:45 +0200)]

2-bit quantizations (llama/4897)

* imatrix: load

* imatrix: WIP

* imatrix: Add Q2_K quantization

* imatrix: also guard against Q2_K_S quantization without importance matrix

* imatrix: guard even more against low-bit quantization misuse

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 14 Jan 2024 08:53:19 +0000 (10:53 +0200)]

scripts : sync-ggml-am.sh add option to skip commits

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jan 2024 22:13:17 +0000 (00:13 +0200)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jan 2024 22:12:17 +0000 (00:12 +0200)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jan 2024 22:09:26 +0000 (00:09 +0200)]

examples : adapt to metal API

commit | commitdiff | tree

Johannes Gäßler [Sat, 13 Jan 2024 20:41:37 +0000 (21:41 +0100)]

ggml: cache sin/cos for RoPE (llama/4908)

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jan 2024 18:45:45 +0000 (20:45 +0200)]

metal : remove old API (llama/4919)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jan 2024 16:46:37 +0000 (18:46 +0200)]

metal : disable log for loaded kernels (llama/4794)

commit | commitdiff | tree

texmex76 [Sat, 13 Jan 2024 16:06:20 +0000 (17:06 +0100)]

gguf : fix potential infinite for-loop (llama/4600)

Co-authored-by: Bernhard Gstrein <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jan 2024 16:03:45 +0000 (18:03 +0200)]

metal : refactor kernel loading code (llama/4794)

* metal : detect more GPU families

* metal : refactor kernel loading

* metal : set kernel family requirements

* metal : fix kernel init + fix compile options

* metal : take into account simdgroup reduction support

* metal : print only skipped kernels

* metal : fix check for simdgroup reduction support

* metal : check for Metal 3

* metal : free allocations

* metal : normalize encoder:setComputePipelineStatus calls

ggml-ci

* metal : fix Metal3 family check

ggml-ci

* metal : check for simdgroup matrix mul. feature

ggml-ci

commit | commitdiff | tree

Johannes Gäßler [Fri, 12 Jan 2024 19:38:54 +0000 (20:38 +0100)]

CUDA: faster q8_0 -> f16 dequantization (llama/4895)

commit | commitdiff | tree

RhinoDevel [Sat, 13 Jan 2024 18:51:35 +0000 (19:51 +0100)]

talk-llama : add optional CLI arg to set the bot name (#1764)

commit | commitdiff | tree

james wolf [Sat, 13 Jan 2024 17:37:18 +0000 (12:37 -0500)]

examples : add python example for transcription (#1744)

* rebase and add simple python interface

* moved python files to examples/python

commit | commitdiff | tree

Georgi Gerganov [Sat, 13 Jan 2024 15:47:40 +0000 (17:47 +0200)]

whisper : load the model into multiple buffers of max size 1GB (#1763)

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jan 2024 20:04:51 +0000 (22:04 +0200)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jan 2024 19:56:50 +0000 (21:56 +0200)]

sync : ggml

commit | commitdiff | tree

slaren [Fri, 12 Jan 2024 19:38:34 +0000 (20:38 +0100)]

backend_sched : fix assignments

ggml-ci

commit | commitdiff | tree

slaren [Fri, 12 Jan 2024 19:07:38 +0000 (20:07 +0100)]

llama : ggml-backend integration (llama/4766)

* llama : ggml-backend integration

* ggml-backend : add names to buffers

* fix unmap after loading

* batched-bench : add tensor_split param

* llama : check for null tensor_split

* ggml-backend : increase GGML_MAX_BACKENDS

* improve graph splitting, partial fix for --no-kv-offload

* cuda : add ggml-backend split buffer support

* cuda : do not create buffer types for devices that don't exist (fixes usage without CUDA devices available)

* ggml : fix null backend dereference (llama/4807)

* ggml : fix null backend dereference

* ggml : also check ggml_backend_is_cpu

* test-backend-ops : check buffer allocation failures

* llama : add cparam (split_mode) and command line argument (--split-mode, -sm) to configure the split mode (none, layer or row)

* ggml : fix mul_mat_id work size

* llama : rewrite session kv load/set without graphs

* minor

* llama : only initialize used backends, free backends on context free

* llama : abort ctx if cuda backend init fails

* llama : rewrite lora with ggml-backend and compute on CPU

ggml-ci

* llama : only map to a backend buffer the region of the file mapping containing the tensors used in the buffer

* opencl : add ggml-backend buffer type

* cuda : only use batched_cublas with batched mat muls (fixes fp16 tg perf)

* llama : on Metal, by default offload the full model

ggml-ci

* metal : page align the data ptr (llama/4854)

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* cuda : fix split buffer free

* address review comments

* llama-bench : add split-mode parameter

* fix whitespace

* opencl : fix double initialization

* server : add --split-mode parameter

* use async copy and compute to improve multi-gpu performance

ggml-ci

* use async memcpys to copy the graph outputs to the CPU

* fix opencl

* use a host buffer for the cpu compute buffer for faster copies to the gpu

---------

Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Johannes Gäßler <redacted>

commit | commitdiff | tree

Johannes Gäßler [Fri, 12 Jan 2024 11:30:41 +0000 (12:30 +0100)]

CUDA: fix softmax compile for old CUDA versions (llama/4862)

commit | commitdiff | tree

Kawrakow [Fri, 12 Jan 2024 05:59:57 +0000 (06:59 +0100)]

Importance Matrix calculation (llama/4861)

* imatrix: 1st version

* imatrix: WIP

* Cleanup

* Update examples/imatrix/imatrix.cpp

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Iwan Kawrakow <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Sơn Phan Trung [Fri, 12 Jan 2024 12:11:04 +0000 (19:11 +0700)]

models : make all scripts to be POSIX Compliant (#1725)

* download-coreml-model: make it POSIX-compliant

* download-ggml-model: posix compliant (2nd)

* minor edit

* forgot to add newline

* generate-coreml-interface: far more straightforward

* generate-coreml-model: done with the posix thingy

* typo

* Update download-ggml-model.sh

* fix

* fix typo

* another fix

* Update download-coreml-model.sh

* Update download-ggml-model.sh

* Update download-coreml-model.sh

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jan 2024 12:02:30 +0000 (14:02 +0200)]

ggml : fix 32-bit ARM compat for IQ2_XS (#1758)

* ggml : fix 32-bit ARM compat

* ggml : fix fix

* ggml : fix fix fix

commit | commitdiff | tree

Boris Bliznioukov [Fri, 12 Jan 2024 11:44:50 +0000 (14:44 +0300)]

go : add SetInitialPrompt method to bindings (#1753)

commit | commitdiff | tree

George Hindle [Fri, 12 Jan 2024 11:42:52 +0000 (11:42 +0000)]

server : add more parameters to server api (#1754)

* feat(server): add more parameters to server api

* fix(server): reset params to original parsed values for each request

commit | commitdiff | tree

Georgi Gerganov [Fri, 12 Jan 2024 11:37:38 +0000 (13:37 +0200)]

whisper : fix segment length with params.no_timestamps == true

commit | commitdiff | tree

George Hindle [Fri, 12 Jan 2024 11:24:38 +0000 (11:24 +0000)]

params : don't compute timestamps when not printing them (#1755)

commit | commitdiff | tree

Georgi Gerganov [Thu, 11 Jan 2024 20:10:10 +0000 (22:10 +0200)]

talk-llama : sync llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Thu, 11 Jan 2024 20:00:12 +0000 (22:00 +0200)]

swift : remove local ggml.h reference

commit | commitdiff | tree

Georgi Gerganov [Thu, 11 Jan 2024 19:57:40 +0000 (21:57 +0200)]

swift : track ggml release branch

commit | commitdiff | tree

Georgi Gerganov [Thu, 11 Jan 2024 19:54:17 +0000 (21:54 +0200)]

sync : ggml

commit | commitdiff | tree

Georgi Gerganov [Thu, 11 Jan 2024 19:49:13 +0000 (21:49 +0200)]

sync : llama.cpp

commit | commitdiff | tree

Kawrakow [Thu, 11 Jan 2024 19:39:39 +0000 (20:39 +0100)]

ggml : SOTA 2-bit quants (add IQ2_XS) (llama/4856)

* iq2_xs: basics

* iq2_xs: this should have been in the basics

* iq2_xs: CUDA and scalar CPU works

* iq2_xs: WIP Metal

* iq2_xs: Metal now works

* iq2_xs: working, but dog slow, ARM_NEON dot product

* iq2_xs: better ARM_NEON dot product

We are now at 19.5 t/s for TG-128 and 61 t/s for PP-512 when
running on the CPU.

* iq2_xs: AVX2 dot product - 19.5 t/s

* iq2_xs: faster AVX2 dit product

21.4 t/s for TG-128, 59.2 t/s for PP-512.
The latter is 2x compared to the previous version.

* iq2_xs: had forgotten to delete iq2-data.h

* Add llama enum for IQ2_XS

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Paul Tsochantaris [Thu, 11 Jan 2024 14:31:52 +0000 (14:31 +0000)]

metal : put encoder debug group behind a define (llama/4873)

commit | commitdiff | tree

Georgi Gerganov [Tue, 9 Jan 2024 17:37:08 +0000 (19:37 +0200)]

metal : improve dequantize precision to match CPU (llama/4836)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Tue, 9 Jan 2024 08:42:06 +0000 (10:42 +0200)]

ggml : fix vld1q_s8_x4 32-bit compat (llama/4828)

* ggml : fix vld1q_s8_x4 32-bit compat

ggml-ci

* ggml : fix 32-bit ARM compat (cont)

ggml-ci

commit | commitdiff | tree

Johannes Gäßler [Tue, 9 Jan 2024 07:58:55 +0000 (08:58 +0100)]

CUDA: faster softmax via shared memory + fp16 math (llama/4742)

commit | commitdiff | tree

Georgi Gerganov [Thu, 11 Jan 2024 07:34:59 +0000 (09:34 +0200)]

metal : fix deprecation warning (ggml/690)

commit | commitdiff | tree

Timothy Cronin [Thu, 11 Jan 2024 07:27:48 +0000 (02:27 -0500)]

ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693)

commit | commitdiff | tree

Jack Mousseau [Wed, 10 Jan 2024 14:19:19 +0000 (06:19 -0800)]

metal : wrap each operation in debug group (ggml/690)

commit | commitdiff | tree

leejet [Wed, 10 Jan 2024 13:13:42 +0000 (21:13 +0800)]

ggml : change GGML_MAX_NAME at compile time (ggml/682)

* change GGML_MAX_NAME to 128

* allow controlling the value of GGML_MAX_NAME through external macro definitions

commit | commitdiff | tree

Halalaluyafail3 [Tue, 9 Jan 2024 16:16:37 +0000 (11:16 -0500)]

Fix execlp call (ggml/689)

NULL can be an integer constant expression with the value zero, in this case the behavior would be undefined because of an incorrect type being passed to the variable arguments.

commit | commitdiff | tree

Kawrakow [Mon, 8 Jan 2024 15:02:32 +0000 (16:02 +0100)]

SOTA 2-bit quants (llama/4773)

* iq2_xxs: basics

* iq2_xxs: scalar and AVX2 dot products

Needed to change Q8_K to have quants in the -127...127 range,
else the IQ2_XXS AVX implementation becomes very awkward.
The alternative would have been to use Q8_0 instead. Perhaps
I'll change later, for now this is what we have.

* iq2_xxs: ARM_NEON dot product

Somehow strangely slow (112 ms/token).

* iq2_xxs: WIP Metal

Dequantize works, something is still wrong with the
dot product.

* iq2_xxs: Metal dot product now works

We have
PP-512 = 475 t/s
TG-128 = 47.3 t/s

Not the greatest performance, but not complete garbage either.

* iq2_xxs: slighty faster dot product

TG-128 is now 48.4 t/s

* iq2_xxs: slighty faster dot product

TG-128 is now 50.9 t/s

* iq2_xxs: even faster Metal dot product

TG-128 is now 54.1 t/s.

Strangely enough, putting the signs lookup table
into shared memory has a bigger impact than the
grid values being in shared memory.

* iq2_xxs: dequantize CUDA kernel - fix conflict with master

* iq2_xxs: quantized CUDA dot product (MMVQ)

We get TG-128 = 153.1 t/s

* iq2_xxs: slightly faster CUDA dot product

TG-128 is now at 155.1 t/s.

* iq2_xxs: add to llama ftype enum

* iq2_xxs: fix MoE on Metal

* Fix missing MMQ ops when on hipBLAS

I had put the ggml_supports_mmq call at the wrong place.

* Fix bug in qequantize_row_iq2_xxs

The 0.25f factor was missing.
Great detective work by @ggerganov!

* Fixing tests

* PR suggestion

---------

Co-authored-by: Iwan Kawrakow <redacted>

commit | commitdiff | tree

Johannes Gäßler [Sun, 7 Jan 2024 16:24:08 +0000 (17:24 +0100)]

CUDA: fixed redundant value dequantization (llama/4809)

commit | commitdiff | tree

Konstantin Zhuravlyov [Sun, 7 Jan 2024 06:52:42 +0000 (01:52 -0500)]

ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (llama/4787)

commit | commitdiff | tree

Georgi Gerganov [Fri, 5 Jan 2024 13:18:21 +0000 (15:18 +0200)]

ggml : do not sched_yield when calling BLAS (llama/4761)

* ggml : do not sched_yield when calling BLAS

ggml-ci

* ggml : fix do_yield logic

ggml-ci

* ggml : simplify do_yield logic

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Thu, 4 Jan 2024 08:12:26 +0000 (10:12 +0200)]

ggml : include stdlib.h before intrin.h (llama/4736)

commit | commitdiff | tree

Alexandru Mariuti [Wed, 10 Jan 2024 16:12:06 +0000 (17:12 +0100)]

swift : checkout ggml commit instead of branch (#1750)

commit | commitdiff | tree

RhinoDevel [Wed, 10 Jan 2024 14:15:28 +0000 (15:15 +0100)]

talk-llama : add optional Piper TTS support (#1749)

Add commented-out command to optionally use Piper (https://github.com/rhasspy/piper) as text-to-speech solution for the talk-llama example. Piper voices sound almost like real people which is a big improvement (e.g.) from something like espeak.

commit | commitdiff | tree

Emmanuel Schmidbauer [Mon, 8 Jan 2024 22:39:51 +0000 (17:39 -0500)]

server : add request path option(#1741)

commit | commitdiff | tree

Georgi Gerganov [Mon, 8 Jan 2024 14:41:28 +0000 (16:41 +0200)]

main : add cli option to disable system prints (#1740)

commit | commitdiff | tree

Georgi Gerganov [Sun, 7 Jan 2024 11:35:14 +0000 (13:35 +0200)]

server : fix server temperature + add temperature_inc (#1729)

* server : fix server temperature + add temperature_inc

* server : change dashes to underscores in parameter names

commit | commitdiff | tree

Georgi Gerganov [Sat, 6 Jan 2024 15:22:57 +0000 (17:22 +0200)]

talk-llama : sync latest llama.cpp

commit | commitdiff | tree

Georgi Gerganov [Fri, 5 Jan 2024 15:11:27 +0000 (17:11 +0200)]

release : v1.5.4

commit | commitdiff | tree

Erik Scholz [Fri, 5 Jan 2024 15:00:00 +0000 (16:00 +0100)]

fix : cuda order of synchronization when setting a buffer (ggml/679)

* fix : cuda order of synchronization when setting a buffer

* also sync before memcpy

---------

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 5 Jan 2024 14:30:52 +0000 (16:30 +0200)]

metal : switch back to default.metallib (ggml/681)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 5 Jan 2024 13:36:04 +0000 (15:36 +0200)]

ggml : fix q2_k bpw in comments (ggml/680)

commit | commitdiff | tree

Yajing Tang [Thu, 4 Jan 2024 14:28:30 +0000 (06:28 -0800)]

coreml : fix ANE optimized encoder (#1716)

commit | commitdiff | tree

Georgi Gerganov [Thu, 4 Jan 2024 12:47:42 +0000 (14:47 +0200)]

whisper.swiftui : add .gitignore

commit | commitdiff | tree

Georgi Gerganov [Thu, 4 Jan 2024 11:37:25 +0000 (13:37 +0200)]

whispser : reset the "batched" timings (#1721)

commit | commitdiff | tree

Georgi Gerganov [Wed, 3 Jan 2024 17:36:33 +0000 (19:36 +0200)]

release : v1.5.3

commit | commitdiff | tree

Ashraful Islam [Wed, 3 Jan 2024 17:30:26 +0000 (11:30 -0600)]

swift : update Package.swift to use ggml as package dependency (#1701)

* updates Package.swift to use ggml as dependency

* cleans up the Package.swift file by removing redundant source files

* updates ggml url src to ggerganov

commit | commitdiff | tree

Finn Voorhees [Wed, 3 Jan 2024 13:39:43 +0000 (08:39 -0500)]

ggml : add error handling to graph_compute (#1714)

commit | commitdiff | tree

Georgi Gerganov [Wed, 3 Jan 2024 12:18:46 +0000 (14:18 +0200)]

cuda : simplify expression

Co-authored-by: slaren <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 3 Jan 2024 11:01:44 +0000 (13:01 +0200)]

cuda : mark I16 and I32 ops as unsupported

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Wed, 3 Jan 2024 09:35:46 +0000 (11:35 +0200)]

metal : add kernel_get_rows_i32

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Tue, 2 Jan 2024 19:07:47 +0000 (21:07 +0200)]

metal : optimize ggml_mul_mat_id (faster Mixtral PP) (llama/4725)

* ggml : disable fast-math for Metal (cmake build only)

ggml-ci

* metal : fix Metal API debug warnings

* cmake : add -fno-inline for Metal build (llama/4545)

* metal : fix API debug warnings

* metal : fix compile warnings

* metal : use uint64_t for strides

* cmake : rename option to LLAMA_METAL_SHADER_DEBUG

* metal : fix mat-vec Q8_0 kernel for BS > 1

* metal : normalize mat-vec kernel signatures

* cmake : respect LLAMA_QKK_64 option

* metal : fix mat-vec Q4_K kernel for QK_K == 64

* metal : optimizing ggml_mul_mat_id (wip)

* metal : minor fix

* metal : opt mul_mm_id

commit | commitdiff | tree

Georgi Gerganov [Tue, 2 Jan 2024 08:57:44 +0000 (10:57 +0200)]

metal : enable shader debugging (cmake option) (llama/4705)

* ggml : disable fast-math for Metal (cmake build only)

ggml-ci

* metal : fix Metal API debug warnings

* cmake : add -fno-inline for Metal build (llama/4545)

* metal : fix API debug warnings

* metal : fix compile warnings

* metal : use uint64_t for strides

* cmake : rename option to LLAMA_METAL_SHADER_DEBUG

* metal : fix mat-vec Q8_0 kernel for BS > 1

* metal : normalize mat-vec kernel signatures

* cmake : respect LLAMA_QKK_64 option

* metal : fix mat-vec Q4_K kernel for QK_K == 64

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sun, 31 Dec 2023 09:43:31 +0000 (11:43 +0200)]

ggml : add ggml_vdotq_s32 alias (llama/4715)

ggml-ci

commit | commitdiff | tree

Johannes Gäßler [Sat, 30 Dec 2023 12:52:01 +0000 (13:52 +0100)]

CUDA: fixed tensor cores not being used on RDNA3 (llama/4697)

commit | commitdiff | tree

automaticcat [Sat, 30 Dec 2023 08:07:48 +0000 (15:07 +0700)]

ggml : add ggml_cpu_has_avx_vnni() (llama/4589)

* feat: add avx_vnni based on intel documents

* ggml: add avx vnni based on intel document

* llama: add avx vnni information display

* docs: add more details about using oneMKL and oneAPI for intel processors

* docs: add more details about using oneMKL and oneAPI for intel processors

* docs: add more details about using oneMKL and oneAPI for intel processors

* docs: add more details about using oneMKL and oneAPI for intel processors

* docs: add more details about using oneMKL and oneAPI for intel processors

* Update ggml.c

Fix indentation upgate

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Johannes Gäßler [Fri, 29 Dec 2023 22:12:53 +0000 (23:12 +0100)]

CUDA: fix tensor core logic for Pascal and HIP (llama/4682)

commit | commitdiff | tree

hydai [Fri, 29 Dec 2023 16:31:19 +0000 (00:31 +0800)]

cuda: fix vmm oom issue on NVIDIA AGX Orin (llama/4687)

Signed-off-by: hydai <redacted>

commit | commitdiff | tree

Guillaume Wenzek [Fri, 29 Dec 2023 17:07:03 +0000 (18:07 +0100)]

ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)

* add more int ops

* ggml_compute_forward_dup_bytes

* add tests

* PR comments

* tests : minor indentations

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 3 Jan 2024 09:42:42 +0000 (11:42 +0200)]

scripts : fix sync order + metal sed

commit | commitdiff | tree

Andreu Huguet [Tue, 2 Jan 2024 16:50:04 +0000 (17:50 +0100)]

examples : fix WASM Stack Overflow (#1713)

Fix for problem:

"""
RuntimeError: Aborted(Stack overflow! Stack cookie has been overwritten at 0x12be2b10, expected hex dwords 0x89BACDFE and 0x2135467, but received 0x00000000 0x00000000)
"""

That appears when executing the WASM example with the newer versions.

commit | commitdiff | tree

bobqianic [Sat, 30 Dec 2023 21:12:31 +0000 (21:12 +0000)]

docker : fix the publishing of the CUDA Docker image (#1704)

commit | commitdiff | tree

Georgi Gerganov [Fri, 29 Dec 2023 13:00:46 +0000 (15:00 +0200)]

scripts : do not sync commits from this repo

commit | commitdiff | tree

Tamotsu Takahashi [Fri, 29 Dec 2023 10:23:27 +0000 (19:23 +0900)]

ci : build with CLBlast + ggml-opencl use GGML_API (#1576)

* Build with CLBlast

* Declare GGML_API

After rebasing, examples/talk-llama failed:

"D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) ->
"D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) ->
(Link target) ->
  llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
  llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context *,void (__cdecl*)(float,void *),void *,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
  D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]

commit | commitdiff | tree

bobqianic [Fri, 29 Dec 2023 09:38:35 +0000 (09:38 +0000)]

whisper : replace `tensor->n_dims` with `ggml_n_dims(tensor)` (#1694)

commit | commitdiff | tree

Georgi Gerganov [Fri, 29 Dec 2023 09:30:47 +0000 (11:30 +0200)]

sync : ggml (VMM, sync-ggml-am, dotprod ARM fixes, CUDA fixes) (#1691)

* scripts : add sync-ggml-am.sh

* sync : ggml (VMM, ARM dot prod fix, etc.)

* build : fix CUDA build

* ggml : fix some mul mat cases + add tests for src1 F16

https://github.com/ggerganov/ggml/commit/dbd02958fa4f46898f68ca29c27ddcdc58a06f98

commit | commitdiff | tree

Dimo [Fri, 29 Dec 2023 09:14:32 +0000 (10:14 +0100)]

download : fix large q5 model name (#1695)

fixed typo in large-v3-q5-0 model name to match HF link

commit | commitdiff | tree

bobqianic [Sat, 23 Dec 2023 12:02:58 +0000 (12:02 +0000)]

whisper : Replace WHISPER_PRINT_DEBUG with WHISPER_LOG_DEBUG (#1681)

commit | commitdiff | tree

Georgi Gerganov [Fri, 22 Dec 2023 15:53:39 +0000 (17:53 +0200)]

sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677)

* sync : ggml

* sync : llama.cpp

* talk-llama : fix obsolete param

* ggml-alloc : fix ggml_tallocr_is_own

* talk.wasm : update to new ggml

* ggml : fix type punning in ggml_scale

* ggml : cuda jetson + arm quants warnings

commit | commitdiff | tree

Chaoqun [Fri, 22 Dec 2023 11:16:02 +0000 (19:16 +0800)]

docker : Dockerize whisper.cpp (#1674)

* build: add dockerfile for ci

* ci: add action to build/push docker image

* fix: lowercase repository to fix ci

* ci: update cuBLAS flag

* build: install curl and ffmped in image

* docs: add docker section

* fix: improve args check when download model

commit | commitdiff | tree

bobqianic [Thu, 21 Dec 2023 22:39:46 +0000 (22:39 +0000)]

CI : Add coverage for talk-llama when WHISPER_CUBLAS=1 (#1672)

Packaging of ggerganov/whisper.cpp