]> git.djapps.eu Git - pkg/ggml/sources/ggml/log
pkg/ggml/sources/ggml
2 years agoggml : sync llama.cpp (AVX improvements)
Georgi Gerganov [Sun, 23 Apr 2023 13:38:00 +0000 (16:38 +0300)]
ggml : sync llama.cpp (AVX improvements)

2 years agoggml : fix Q4_3 cuBLAS + fix quantize_row_q4_2()
Georgi Gerganov [Sat, 22 Apr 2023 13:34:39 +0000 (16:34 +0300)]
ggml : fix Q4_3 cuBLAS + fix quantize_row_q4_2()

2 years agoexamples : refactor quantization tools
Georgi Gerganov [Sat, 22 Apr 2023 12:49:15 +0000 (15:49 +0300)]
examples : refactor quantization tools

2 years agoexamples : utils -> common
Georgi Gerganov [Sat, 22 Apr 2023 11:59:42 +0000 (14:59 +0300)]
examples : utils -> common

2 years agoggml : fix ARM build
Georgi Gerganov [Sat, 22 Apr 2023 10:59:49 +0000 (13:59 +0300)]
ggml : fix ARM build

2 years agocmake : add CMake support for cuBLAS (#101)
Georgi Gerganov [Sat, 22 Apr 2023 10:23:20 +0000 (13:23 +0300)]
cmake : add CMake support for cuBLAS (#101)

* cmake : add cuBLAS support

* cmake : fix cuBLAS build

2 years agoexamples : add Q4_2 and Q4_3 quantization support
Georgi Gerganov [Sat, 22 Apr 2023 09:52:25 +0000 (12:52 +0300)]
examples : add Q4_2 and Q4_3 quantization support

2 years agoggml : sync llama.cpp (Q4_3 + CUDA)
Georgi Gerganov [Sat, 22 Apr 2023 09:36:42 +0000 (12:36 +0300)]
ggml : sync llama.cpp (Q4_3 + CUDA)

2 years agomnist : add missing header (#95)
Bart Pelle [Thu, 20 Apr 2023 21:15:45 +0000 (23:15 +0200)]
mnist : add missing header (#95)

2 years agostablelm : update README.md
Georgi Gerganov [Thu, 20 Apr 2023 20:35:52 +0000 (23:35 +0300)]
stablelm : update README.md

2 years agominor : fix GPT-NeoX name
Georgi Gerganov [Thu, 20 Apr 2023 20:23:07 +0000 (23:23 +0300)]
minor : fix GPT-NeoX name

2 years agoreadme : add StableLM reference
Georgi Gerganov [Thu, 20 Apr 2023 20:21:38 +0000 (23:21 +0300)]
readme : add StableLM reference

2 years agoexamples : add StableLM example (#96)
Georgi Gerganov [Thu, 20 Apr 2023 20:20:38 +0000 (23:20 +0300)]
examples : add StableLM example (#96)

* ggml : there is a bug in ggml_cpy() F32 -> F32

Cannot see why, but multi-thread does not work

* stablelm : initial implementation, but QKV seems broken

* stablelm : make it work

* stablelm : use original merged QKV matrix

* stablelm : minor

* stablelm : instructions

* stablelm : update README.md

2 years agoggml : sync llama.cpp (cuBLAS, Q4_3, bug fix, etc)
Georgi Gerganov [Thu, 20 Apr 2023 19:00:49 +0000 (22:00 +0300)]
ggml : sync llama.cpp (cuBLAS, Q4_3, bug fix, etc)

2 years agoggml : sync llama.cpp
Georgi Gerganov [Wed, 19 Apr 2023 17:20:23 +0000 (20:20 +0300)]
ggml : sync llama.cpp

2 years agoexamples : update huggingface links
Georgi Gerganov [Sat, 15 Apr 2023 19:23:10 +0000 (22:23 +0300)]
examples : update huggingface links

2 years agoggml : sync llama.cpp
Georgi Gerganov [Sat, 15 Apr 2023 16:50:54 +0000 (19:50 +0300)]
ggml : sync llama.cpp

2 years agoggml : add ggml_type_name()
Georgi Gerganov [Sat, 15 Apr 2023 11:25:34 +0000 (14:25 +0300)]
ggml : add ggml_type_name()

2 years agoggml : use posix_memalign on non-Windows env
Georgi Gerganov [Sat, 15 Apr 2023 11:23:26 +0000 (14:23 +0300)]
ggml : use posix_memalign on non-Windows env

2 years agoggml : add unary and binary map operations
Georgi Gerganov [Fri, 14 Apr 2023 14:45:54 +0000 (17:45 +0300)]
ggml : add unary and binary map operations

2 years agoggml : avoid powf() calls in ggml_rope()
Georgi Gerganov [Fri, 14 Apr 2023 10:32:27 +0000 (13:32 +0300)]
ggml : avoid powf() calls in ggml_rope()

2 years agoggml : fix ARM NEON dot product types
Georgi Gerganov [Fri, 14 Apr 2023 10:32:12 +0000 (13:32 +0300)]
ggml : fix ARM NEON dot product types

2 years agomnist : update README
Georgi Gerganov [Thu, 13 Apr 2023 21:02:31 +0000 (00:02 +0300)]
mnist : update README

2 years agomnist : minor fixes and adjustments
Georgi Gerganov [Thu, 13 Apr 2023 21:00:42 +0000 (00:00 +0300)]
mnist : minor fixes and adjustments

2 years agoexamples : MNIST example for ggml (#84)
Ray Cromwell [Thu, 13 Apr 2023 20:49:45 +0000 (13:49 -0700)]
examples : MNIST example for ggml (#84)

2 years agoggml : sync latest changes from llama.cpp
Georgi Gerganov [Thu, 13 Apr 2023 15:37:19 +0000 (18:37 +0300)]
ggml : sync latest changes from llama.cpp

2 years agogpt-2 : typo fix for the Cerebras instructions (#57)
Jakob Frick [Thu, 13 Apr 2023 12:41:53 +0000 (14:41 +0200)]
gpt-2 : typo fix for the Cerebras instructions (#57)

2 years agoggml : add GGML_DEFAULT_N_THREADS
Georgi Gerganov [Thu, 13 Apr 2023 12:40:33 +0000 (15:40 +0300)]
ggml : add GGML_DEFAULT_N_THREADS

2 years agogpt : fix pytorch converter text encodings (#78)
LostRuins [Thu, 13 Apr 2023 12:27:56 +0000 (20:27 +0800)]
gpt : fix pytorch converter text encodings (#78)

* Fixed quantization for f16 models not working - this is because the f16 tables were not initialized thus f16 to f32 conversion was failing.

* On some situations, the script fails with the error : UnicodeDecodeError: 'charmap' codec can't decode byte (byte) in position (number) : character maps to <undefined>
This is probably because the encodings are incorrect.
Explicitly specifying them as UTF-8 seems to resolve the issue and allow for correct conversion.

---------

Co-authored-by: Georgi Gerganov <redacted>
2 years agoreadme : update roadmap
Georgi Gerganov [Wed, 12 Apr 2023 15:59:41 +0000 (18:59 +0300)]
readme : update roadmap

2 years agogpt-j : update inference to match latest llama.cpp insights
Georgi Gerganov [Tue, 11 Apr 2023 18:33:17 +0000 (21:33 +0300)]
gpt-j : update inference to match latest llama.cpp insights

- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy

2 years agoggml : fix <windows.h> include
Georgi Gerganov [Mon, 10 Apr 2023 20:21:11 +0000 (23:21 +0300)]
ggml : fix <windows.h> include

2 years agoggml : fix WASM build
Georgi Gerganov [Mon, 10 Apr 2023 20:19:15 +0000 (23:19 +0300)]
ggml : fix WASM build

2 years agowhisper : sync with whisper.cpp
Georgi Gerganov [Mon, 10 Apr 2023 19:39:24 +0000 (22:39 +0300)]
whisper : sync with whisper.cpp

2 years agoggml : optimize ggml_cpy() for contiguous dst
Georgi Gerganov [Mon, 10 Apr 2023 19:39:07 +0000 (22:39 +0300)]
ggml : optimize ggml_cpy() for contiguous dst

2 years agoggml : sync with llama.cpp
Georgi Gerganov [Mon, 10 Apr 2023 16:36:06 +0000 (19:36 +0300)]
ggml : sync with llama.cpp

- int64_t number of elements
- remove mlock
- expose quantization functions
- expose ggml_object
- add ggml_view_3d()
- multi-thread ggml_rope()
- fix ggml_cpy()
- add ggml_init_params.no_alloc
- fix ggml_mul_mat() backward

2 years agogpt : initialize f16 tables during quantization (#77)
LostRuins [Mon, 10 Apr 2023 07:47:47 +0000 (15:47 +0800)]
gpt : initialize f16 tables during quantization (#77)

2 years agoreadme : update Roadmap (add rwkv.cpp)
Georgi Gerganov [Fri, 7 Apr 2023 18:21:33 +0000 (21:21 +0300)]
readme : update Roadmap (add rwkv.cpp)

2 years agogpt-2 : minor update readme
Georgi Gerganov [Thu, 30 Mar 2023 21:37:37 +0000 (00:37 +0300)]
gpt-2 : minor update readme

2 years agogpt-2 : fix qunatize tool to quantize the "lm_head" tensor
Georgi Gerganov [Thu, 30 Mar 2023 21:34:14 +0000 (00:34 +0300)]
gpt-2 : fix qunatize tool to quantize the "lm_head" tensor

2 years agogpt-2 : add Cerebras-GPT example
Georgi Gerganov [Thu, 30 Mar 2023 20:39:15 +0000 (23:39 +0300)]
gpt-2 : add Cerebras-GPT example

2 years agoggml : fix NEON sign types (#51)
Supreet Sethi [Thu, 30 Mar 2023 17:25:29 +0000 (01:25 +0800)]
ggml : fix NEON sign types (#51)

2 years agogpt-2 : convert h5 to ggml (#35)
Cordeiro [Wed, 29 Mar 2023 20:39:27 +0000 (15:39 -0500)]
gpt-2 : convert h5 to ggml (#35)

* Script to convert h5 to ggml adapted from gpt-j example

* Fix map tensors

* optimize

* rename headers to keep compatibility

* revert gpt-2/main.cpp

---------

Co-authored-by: Alan <redacted>
Co-authored-by: Alan <redacted>
Co-authored-by: ocordeiro <redacted>
2 years agoreadme : update Roadmap
Georgi Gerganov [Wed, 29 Mar 2023 19:23:14 +0000 (22:23 +0300)]
readme : update Roadmap

2 years agoggml : 4-bit Integer quantisation + many llama.cpp improvements (#27)
Georgi Gerganov [Wed, 29 Mar 2023 19:21:36 +0000 (22:21 +0300)]
ggml : 4-bit Integer quantisation + many llama.cpp improvements (#27)

* gq : attempt at n-bit quantization

* gq : add amax based method 3

* gq : progress on method 2

* gq : method 4 (AVX2)

* gq : method 4 (ARM)

* gq : method 4 (AVX2 attempt) + method 5 (no min)

* gq : method 5 (ARM)

* gpt-2 : model conversion for Q4_0 quantization

* ggml : Q4_0 quantization support (ggml_get_rows())

* gpt-2 : loading Q4_0 quantized model

* ggml : q4_0 quantization support

* ggml : q4_1 quantization support (seems to work for bigger models)

* gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models

* ggml : 4-bit quantization works (only scalar for now)

* gq : add method 6 (ARM)

* ggml : vectorized mad q4_0 (ARM)

* ggml : vectorized quantize_row_q4_0 (ARM)

* ggml : simplify mad q4_0 (ARM)

* ggml : minor indentations

* gpt-j : support for 4-bit quantized model inference

* ggml : GGML_ASSERT() instead of assert() where appropriate

* gpt : avoid ggml_transpose on model tensors (new models!)

* gpt-2 : minor

* gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)

* ggml : add ggml_compute_forward_rope_f16()

* gpt : fix memory usage computation

* ggml : fix ggml_is_contiguous() to take into account blck size

* whisper : add whisper-qunatize tool

* whisper : add support for quantized models

* whisper : mem usage based on model format type

* gpt : seems not worth to use FP16 for KV cache

* gpt : support quantisation of f16 models files

* ggml : fixes for rpi4

* whisper : add Q4_1 model sizes

* ggml : add WASM SIMD for Q4_0

* utils : print quantization histograms

* ggml : sync all changes from llama.cpp and whisper.cpp

* ggml : finalize the Q4_1 quantization for ARM_NEON

2 years agoggml : make it work on Windows (#46)
MaiHD [Sat, 25 Mar 2023 20:43:24 +0000 (03:43 +0700)]
ggml : make it work on Windows (#46)

2 years agotests : add test-blas0
Georgi Gerganov [Sat, 25 Mar 2023 14:32:48 +0000 (16:32 +0200)]
tests : add test-blas0

2 years agoFix CMake indentation
Georgi Gerganov [Wed, 22 Mar 2023 19:52:32 +0000 (21:52 +0200)]
Fix CMake indentation

2 years agoadd OpenBLAS detection and modify tests codes (#40)
katsu560 [Wed, 22 Mar 2023 19:51:47 +0000 (04:51 +0900)]
add OpenBLAS detection and modify tests codes (#40)

* fix indents and commands for Haiku, and add OpenBLAS detection in src/CMakeLists.txt

* add system detection and add OpenBLAS detection

* change loop number by environment variable GGML_NLOOP or command line option

* change fmadd codes on no FMA support system

* change n_threads by environment variable GGML_NTHREADS or command line option

---------

Co-authored-by: Georgi Gerganov <redacted>
2 years agoCMakeLists: Fix Haiku CPU detection (#39)
Alex von Gluck IV [Wed, 22 Mar 2023 19:43:58 +0000 (14:43 -0500)]
CMakeLists: Fix Haiku CPU detection (#39)

2 years agoAdd pipe input for prompt on gpt examples (#38)
hidenorly [Wed, 22 Mar 2023 19:43:22 +0000 (04:43 +0900)]
Add pipe input for prompt on gpt examples (#38)

Enable prompt input through pipe, instead of using -p option.
This makes easier to give longer and multiple lines for the prompt.

Test:
 $ echo "This is an example" > prompt.txt
 $ cat prompt.txt | ./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin
 $ cat promot.txt | ./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin

Note that -p option and no -p specified case are kept.
 $ ./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"
 $ ./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin
 $ ./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
 $ ./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin

2 years agocmake : update CMakeLists.txt to add correct flags (#26)
katsu560 [Mon, 6 Mar 2023 17:52:16 +0000 (02:52 +0900)]
cmake : update CMakeLists.txt to add correct flags (#26)

* modify src/CMakeLists.txt from whisper.cpp

* cmake : remove OpenBLAS stuff

---------

Co-authored-by: Georgi Gerganov <redacted>
2 years agoreadme : update Roadmap
Georgi Gerganov [Mon, 6 Mar 2023 05:40:55 +0000 (07:40 +0200)]
readme : update Roadmap

2 years agoreadme : add Roadmap section
Georgi Gerganov [Sun, 5 Mar 2023 16:02:27 +0000 (18:02 +0200)]
readme : add Roadmap section

2 years agosync : latest whisper.cpp
Georgi Gerganov [Sun, 26 Feb 2023 19:10:50 +0000 (21:10 +0200)]
sync : latest whisper.cpp

2 years agotests : fix cblas_sgemm call
Georgi Gerganov [Tue, 21 Feb 2023 20:16:56 +0000 (22:16 +0200)]
tests : fix cblas_sgemm call

2 years agotests : add SVD experiments
Georgi Gerganov [Sat, 18 Feb 2023 14:05:31 +0000 (16:05 +0200)]
tests : add SVD experiments

2 years agosync : latest whisper.cpp (scratch buffers in ggml)
Georgi Gerganov [Wed, 15 Feb 2023 18:59:36 +0000 (20:59 +0200)]
sync : latest whisper.cpp (scratch buffers in ggml)

2 years agoUpdate README.md
Georgi Gerganov [Fri, 20 Jan 2023 06:45:45 +0000 (08:45 +0200)]
Update README.md

2 years agocmake : configure CMAKE_C_FLAGS and target_link_libraries for MSVC (#15)
Takuya Takeuchi [Sun, 15 Jan 2023 14:30:13 +0000 (23:30 +0900)]
cmake : configure CMAKE_C_FLAGS and target_link_libraries for MSVC (#15)

2 years agogpt : fix sampling to use the temperature (close #16)
Georgi Gerganov [Sun, 15 Jan 2023 13:53:08 +0000 (15:53 +0200)]
gpt : fix sampling to use the temperature (close #16)

2 years agoggml : sync latest whisper.cpp
Georgi Gerganov [Sun, 15 Jan 2023 13:09:36 +0000 (15:09 +0200)]
ggml : sync latest whisper.cpp

2 years agogpt-2 : fix broken prompt due to recent experiments
Georgi Gerganov [Sun, 8 Jan 2023 18:28:38 +0000 (20:28 +0200)]
gpt-2 : fix broken prompt due to recent experiments

No idea why I commited that!?

2 years agoggml : sync latest whisper.cpp
Georgi Gerganov [Sun, 8 Jan 2023 18:23:01 +0000 (20:23 +0200)]
ggml : sync latest whisper.cpp

2 years agocmake : disable warnings about unused functions
Georgi Gerganov [Sat, 7 Jan 2023 19:05:33 +0000 (21:05 +0200)]
cmake : disable warnings about unused functions

2 years agoggml : bugfix in new soft max computation
Georgi Gerganov [Sat, 7 Jan 2023 19:04:24 +0000 (21:04 +0200)]
ggml : bugfix in new soft max computation

2 years agotests : change test2 eps
Georgi Gerganov [Sat, 7 Jan 2023 18:00:25 +0000 (20:00 +0200)]
tests : change test2 eps

2 years agoggml : sync with latest whisper.cpp
Georgi Gerganov [Sat, 7 Jan 2023 17:53:05 +0000 (19:53 +0200)]
ggml : sync with latest whisper.cpp

2 years agotests : some more quantization experiments
Georgi Gerganov [Sat, 7 Jan 2023 10:17:34 +0000 (12:17 +0200)]
tests : some more quantization experiments

2 years agosync : forgot to sync ggml.h
Georgi Gerganov [Sat, 7 Jan 2023 07:43:02 +0000 (09:43 +0200)]
sync : forgot to sync ggml.h

2 years agosync : latest changes from whisper.cpp
Georgi Gerganov [Sat, 7 Jan 2023 07:39:12 +0000 (09:39 +0200)]
sync : latest changes from whisper.cpp

2 years agotests : wip quantized matrix multiplication method 2
Georgi Gerganov [Sat, 7 Jan 2023 07:36:32 +0000 (09:36 +0200)]
tests : wip quantized matrix multiplication method 2

2 years agotests : minor fixes for x86
Georgi Gerganov [Sat, 7 Jan 2023 07:31:42 +0000 (09:31 +0200)]
tests : minor fixes for x86

2 years agotests : experiments with n-bit quantized matrix multiplication
Georgi Gerganov [Thu, 5 Jan 2023 19:05:41 +0000 (21:05 +0200)]
tests : experiments with n-bit quantized matrix multiplication

2 years agosync : latest changes from whisper.cpp
Georgi Gerganov [Sat, 31 Dec 2022 10:32:04 +0000 (12:32 +0200)]
sync : latest changes from whisper.cpp

2 years agogpt-2 : experimenting with attention mask
Georgi Gerganov [Sat, 31 Dec 2022 10:29:52 +0000 (12:29 +0200)]
gpt-2 : experimenting with attention mask

2 years agogpt-2 : fix off-by-one error in batching logic
Georgi Gerganov [Sat, 31 Dec 2022 10:29:30 +0000 (12:29 +0200)]
gpt-2 : fix off-by-one error in batching logic

2 years agoexamples : redirect download scripts to HF
Georgi Gerganov [Mon, 12 Dec 2022 21:49:12 +0000 (23:49 +0200)]
examples : redirect download scripts to HF

2 years agogpt : add support for gpt-jt + fix unicode support
Georgi Gerganov [Sun, 4 Dec 2022 16:33:14 +0000 (18:33 +0200)]
gpt : add support for gpt-jt + fix unicode support

2 years agoggml : sync with latest code from whisper.cpp
Georgi Gerganov [Sun, 4 Dec 2022 09:06:13 +0000 (11:06 +0200)]
ggml : sync with latest code from whisper.cpp

2 years agosync : latest changes from whisper.cpp
Georgi Gerganov [Wed, 9 Nov 2022 19:43:03 +0000 (21:43 +0200)]
sync : latest changes from whisper.cpp

- Documentation
- whisper : token-level timestamps
- ggml : Windows build fixes
- etc.

2 years agoUpdate README.md
Georgi Gerganov [Tue, 1 Nov 2022 20:15:22 +0000 (22:15 +0200)]
Update README.md

2 years agosync : latest changes from whisper.cpp
Georgi Gerganov [Tue, 1 Nov 2022 20:13:15 +0000 (22:13 +0200)]
sync : latest changes from whisper.cpp

2 years agowhisper : fix timestamp sampling
Georgi Gerganov [Tue, 18 Oct 2022 18:14:27 +0000 (21:14 +0300)]
whisper : fix timestamp sampling

2 years agosync : whisper.cpp
Georgi Gerganov [Tue, 18 Oct 2022 16:12:07 +0000 (19:12 +0300)]
sync : whisper.cpp

- Add MSVC header
- FP16 GELU
- C interface fixes (no unions)
- Minor CMake updates

2 years agosync : whisper.cpp
Georgi Gerganov [Mon, 17 Oct 2022 20:54:35 +0000 (23:54 +0300)]
sync : whisper.cpp

2 years agoMinor fixes
Georgi Gerganov [Mon, 17 Oct 2022 18:31:23 +0000 (21:31 +0300)]
Minor fixes

2 years agoImprove mul_mat performance for big matrices using Accelerate framework
Georgi Gerganov [Mon, 17 Oct 2022 18:20:33 +0000 (21:20 +0300)]
Improve mul_mat performance for big matrices using Accelerate framework

Also:

- Speedup GELU operator via F16 cast
- Multi-thread NORM operator
- Disable FLASH_FF in whisper example

2 years agoPerformance tests - trying to optimize mul_mat
Georgi Gerganov [Mon, 17 Oct 2022 18:17:13 +0000 (21:17 +0300)]
Performance tests - trying to optimize mul_mat

2 years agosync : whisper.cpp
Georgi Gerganov [Thu, 13 Oct 2022 19:18:46 +0000 (22:18 +0300)]
sync : whisper.cpp

2 years agowhisper : sync with whisper.cpp
Georgi Gerganov [Sat, 8 Oct 2022 15:15:22 +0000 (18:15 +0300)]
whisper : sync with whisper.cpp

2 years agowhisper : various improvements
Georgi Gerganov [Wed, 5 Oct 2022 20:15:10 +0000 (23:15 +0300)]
whisper : various improvements

2 years agowhisper : add C-style API
Georgi Gerganov [Tue, 4 Oct 2022 20:17:35 +0000 (23:17 +0300)]
whisper : add C-style API

2 years agowhisper : various fixes
Georgi Gerganov [Mon, 3 Oct 2022 16:31:17 +0000 (19:31 +0300)]
whisper : various fixes

2 years agowhisper : various updates and improvements
Georgi Gerganov [Fri, 30 Sep 2022 16:16:07 +0000 (19:16 +0300)]
whisper : various updates and improvements

2 years agoAdding Whisper inference example
Georgi Gerganov [Wed, 28 Sep 2022 18:12:20 +0000 (21:12 +0300)]
Adding Whisper inference example

2 years agoUpdate README.md + minor stuff
Georgi Gerganov [Mon, 19 Sep 2022 21:09:34 +0000 (00:09 +0300)]
Update README.md + minor stuff

- Changed default threads to 4
- Added GGML_PERF for enabling runtime performance timings

2 years agoUpdate README.md
Georgi Gerganov [Sun, 18 Sep 2022 17:12:43 +0000 (20:12 +0300)]
Update README.md

2 years agoInitial release
Georgi Gerganov [Sun, 18 Sep 2022 17:11:11 +0000 (20:11 +0300)]
Initial release