]> git.djapps.eu Git - pkg/ggml/sources/ggml/log
pkg/ggml/sources/ggml
2 years agomnist : add progress indicator on the web page (#194)
Radoslav Gerganov [Thu, 25 May 2023 08:27:15 +0000 (11:27 +0300)]
mnist : add progress indicator on the web page (#194)

Prevent user actions before the model and the data set is loaded

2 years agomnist : add WASM instructions + web-page link
Georgi Gerganov [Wed, 24 May 2023 08:49:53 +0000 (11:49 +0300)]
mnist : add WASM instructions + web-page link

2 years agomnist : add web page for the MNIST example (#190)
Radoslav Gerganov [Wed, 24 May 2023 08:40:47 +0000 (11:40 +0300)]
mnist : add web page for the MNIST example (#190)

The web page is using WASM for model inference.
Users can draw digits on an HTML canvas and load random digits from the
MNIST dataset.

2 years agomnist : cleanup main.cpp
Georgi Gerganov [Wed, 24 May 2023 08:38:41 +0000 (11:38 +0300)]
mnist : cleanup main.cpp

2 years agodocs : add golang transformer bindings (#191)
Ettore Di Giacinto [Wed, 24 May 2023 08:01:31 +0000 (10:01 +0200)]
docs : add golang transformer bindings (#191)

This PR adds golang bindings to transformers in ggml

2 years agompt : fix n_ctx (close #165)
Georgi Gerganov [Wed, 24 May 2023 07:54:45 +0000 (10:54 +0300)]
mpt : fix n_ctx (close #165)

2 years agoexamples : remove prompt pipe-in support
Georgi Gerganov [Wed, 24 May 2023 07:41:06 +0000 (10:41 +0300)]
examples : remove prompt pipe-in support

Need cross-platform solution, factored out in common

2 years agocommon : add missing declarations
Georgi Gerganov [Wed, 24 May 2023 07:40:27 +0000 (10:40 +0300)]
common : add missing declarations

2 years agompt : utf-8 support, perplexity testing, repeat penalty sampling (#184)
klosax [Wed, 24 May 2023 07:27:36 +0000 (09:27 +0200)]
mpt : utf-8 support, perplexity testing, repeat penalty sampling (#184)

* common: utf-8 decoder, reverted gpt_toeknize utf-8 convert

* Update common.h

* main: decode utf-8 tokens on load

* mpt import: bug fix

* common: style fixes

* common: style fix

* Update common.h

* common: revert gpt_tokenize utf-8 convert

* Update common.cpp

* Update common.cpp

* Update common.cpp

* Add perplexity to mpt

* Update CMakeLists: perplexity

* mpt-perplexity: fixes

* Update perplexity.cpp

* common: add sampling with repeat penalty

* mpt-main: add repeat penalty sampling, add commandline parameters

* Update common.h

* mpt-main: style fixes

* Update perplexity.cpp

* Delete perplexity.cpp

* mpt: move perplexity to main

* mpt: move perplexity to main

* common.cpp: Use codecvt utf-8 converter

* main.cpp: Use codecvt utf-8 converter

* mpt : code style changes

---------

Co-authored-by: Georgi Gerganov <redacted>
2 years agoreadme : update Features
Georgi Gerganov [Mon, 22 May 2023 14:57:21 +0000 (17:57 +0300)]
readme : update Features

2 years agoreadme : add link to python bindings (#181)
Ravindra Marella [Sun, 21 May 2023 12:32:05 +0000 (18:02 +0530)]
readme : add link to python bindings (#181)

2 years agocommon : support utf-8 + fix gpt_tokenize + fix MPT model import (#179)
klosax [Sun, 21 May 2023 08:21:51 +0000 (10:21 +0200)]
common : support utf-8 + fix gpt_tokenize + fix MPT model import (#179)

* Update convert-h5-to-ggml.py

* Import tokens correctly

* gpt_tokenize: Convert input to utf-8 + bug fix

* common : minor style fixes

---------

Co-authored-by: Georgi Gerganov <redacted>
2 years agoreadme : add link to GGML format docs (#177)
Dan Forbes [Sat, 20 May 2023 18:25:25 +0000 (11:25 -0700)]
readme : add link to GGML format docs (#177)

2 years agoexamples : use scratch buffers to reduce memory usage (#176)
Georgi Gerganov [Sat, 20 May 2023 17:56:35 +0000 (20:56 +0300)]
examples : use scratch buffers to reduce memory usage (#176)

* starcoder : example for using scratch buffers to reduce memory usage

* starcoder : bump scratch buffers to 256 MB

* examples : add scratch buffers to MPT and GPT-NeoX

2 years agoggml : update WASM SIMD
Georgi Gerganov [Sat, 20 May 2023 17:00:27 +0000 (20:00 +0300)]
ggml : update WASM SIMD

2 years agowhisper : fix Hebrew lang id
Georgi Gerganov [Sat, 20 May 2023 15:59:04 +0000 (18:59 +0300)]
whisper : fix Hebrew lang id

2 years agoexamples : add quantize version to MPT and Replit examples (ref #168)
Georgi Gerganov [Sat, 20 May 2023 15:01:40 +0000 (18:01 +0300)]
examples : add quantize version to MPT and Replit examples (ref #168)

2 years agocommon : force --top_k to be at least 1
Georgi Gerganov [Sat, 20 May 2023 14:45:49 +0000 (17:45 +0300)]
common : force --top_k to be at least 1

2 years agoexamples : fix vocab loading (close #163)
Georgi Gerganov [Sat, 20 May 2023 14:33:07 +0000 (17:33 +0300)]
examples : fix vocab loading (close #163)

2 years agocommon : fix gpt_tokenize (ref #170)
Georgi Gerganov [Sat, 20 May 2023 14:22:58 +0000 (17:22 +0300)]
common : fix gpt_tokenize (ref #170)

2 years agodolly-v2 : par_res and neox changes (#167)
Michael Verrilli [Sat, 20 May 2023 14:12:24 +0000 (10:12 -0400)]
dolly-v2 : par_res and neox changes (#167)

* dolly-v2 example: par_res and neox changes

* Update examples/dolly-v2/quantize.cpp

---------

Co-authored-by: Georgi Gerganov <redacted>
2 years agoexamples : call ggml_time_init() (close #166)
Georgi Gerganov [Sat, 20 May 2023 14:09:41 +0000 (17:09 +0300)]
examples : call ggml_time_init() (close #166)

2 years agoUpdate README.md
Georgi Gerganov [Sat, 20 May 2023 13:48:03 +0000 (16:48 +0300)]
Update README.md

2 years agoggml : sync llama.cpp - CUDA improvements + ggml minor fixes
Georgi Gerganov [Sat, 20 May 2023 12:59:34 +0000 (15:59 +0300)]
ggml : sync llama.cpp - CUDA improvements + ggml minor fixes

2 years agoggml : sync llama.cpp - new quantization formats Q4 + Q8
Georgi Gerganov [Sat, 20 May 2023 11:56:14 +0000 (14:56 +0300)]
ggml : sync llama.cpp - new quantization formats Q4 + Q8

2 years agoreadme : update roadmap (#164)
pikalover6 [Thu, 18 May 2023 06:52:22 +0000 (23:52 -0700)]
readme : update roadmap (#164)

+ MPT & Replit

2 years agoexamples : sample replit + MPT inference (#145)
Lukas Möller [Wed, 17 May 2023 19:58:21 +0000 (21:58 +0200)]
examples : sample replit + MPT inference (#145)

* Add replit model

* Add unigram tokenization support

* Remove debug log

* Port alibi attn bias fix

* Remove torch input

* Fix hardcoded path

* Remove unsupported hyperparams

* Add mpt

* Add replit quantization script

* Remove debug print

* Add quantization support to mpt

* Reformat

* Remove trailing return type

* Implement stylistic changes

* use f16 in k/v memory calculations for replit/mpt

* Update context size calculation

* Add clip_qkv and alibi_bias_max support

* fix clamping implementation, remove implicit conversions

* Fix qkv if condition

* Fix replit context size calculation

* Potentially fix gcc compilation error

* Fix warning

* Adjust object overhead

* Remove dead code

2 years agoexamples : fix a hyperparameter value in gpt-neox (#161) (#162)
jaeminSon [Wed, 17 May 2023 15:49:37 +0000 (00:49 +0900)]
examples : fix a hyperparameter value in gpt-neox (#161) (#162)

2 years agoggml : fix typo in ggml_diag_mask_zero_inplace() (#159)
Andrei [Wed, 17 May 2023 06:27:11 +0000 (02:27 -0400)]
ggml : fix typo in ggml_diag_mask_zero_inplace() (#159)

2 years agoreadme : add link to training example
Georgi Gerganov [Mon, 15 May 2023 04:50:54 +0000 (07:50 +0300)]
readme : add link to training example

2 years agoggml : add AVX dot products
Georgi Gerganov [Sun, 14 May 2023 15:56:18 +0000 (18:56 +0300)]
ggml : add AVX dot products

2 years agowhisper : sync whisper.cpp
Georgi Gerganov [Sun, 14 May 2023 15:55:29 +0000 (18:55 +0300)]
whisper : sync whisper.cpp

2 years agostarcoder : detect santacoder fix end of text token (#155)
IGUILIZ Salah-Eddine [Sun, 14 May 2023 15:31:08 +0000 (17:31 +0200)]
starcoder : detect santacoder fix end of text token (#155)

Co-authored-by: IGUILIZ Salah-Eddine <redacted>
2 years agoreadme : add re-quantization warning
Georgi Gerganov [Sun, 14 May 2023 14:26:41 +0000 (17:26 +0300)]
readme : add re-quantization warning

2 years agoexamples : use inplace calls explicitly
Georgi Gerganov [Sun, 14 May 2023 12:10:32 +0000 (15:10 +0300)]
examples : use inplace calls explicitly

2 years agotests : add tests from llama.cpp
Georgi Gerganov [Sun, 14 May 2023 11:55:28 +0000 (14:55 +0300)]
tests : add tests from llama.cpp

2 years agoggml : fix multi-threaded ggml_compute_forward_diag_mask_f32()
Georgi Gerganov [Sun, 14 May 2023 11:45:13 +0000 (14:45 +0300)]
ggml : fix multi-threaded ggml_compute_forward_diag_mask_f32()

2 years agoggml : fix rope calculation (!inplace + GPT-NeoX mode)
Georgi Gerganov [Sun, 14 May 2023 11:16:47 +0000 (14:16 +0300)]
ggml : fix rope calculation (!inplace + GPT-NeoX mode)

2 years agoggml : new Q4 and Q5 quantization formats + backward ops
Georgi Gerganov [Sun, 14 May 2023 08:23:02 +0000 (11:23 +0300)]
ggml : new Q4 and Q5 quantization formats + backward ops

sync llama.cpp

- bump GGML_QNT_VERSION -> 1
- increase cwggml object overhead size from 256 to 512 in examples
- drop Q4_2 support
- tensor backend support CUDA

2 years agoggml : add GGML_QNT_VERSION for tracking changes to the quantization format
Georgi Gerganov [Sun, 14 May 2023 07:07:27 +0000 (10:07 +0300)]
ggml : add GGML_QNT_VERSION for tracking changes to the quantization format

ref #150

2 years agowhisper : sync whisper.cpp minor changes
Georgi Gerganov [Sun, 14 May 2023 07:06:19 +0000 (10:06 +0300)]
whisper : sync whisper.cpp minor changes

2 years agostarcoder : update example to follow the naming convention of other examples (#153)
Ravindra Marella [Sat, 13 May 2023 13:47:02 +0000 (19:17 +0530)]
starcoder : update example to follow the naming convention of other examples (#153)

2 years agoreadme : fix gpt-neox example link
Georgi Gerganov [Sat, 13 May 2023 13:02:49 +0000 (16:02 +0300)]
readme : fix gpt-neox example link

2 years agoexamples : fix warnings (#152)
Ravindra Marella [Sat, 13 May 2023 12:24:47 +0000 (17:54 +0530)]
examples : fix warnings (#152)

2 years agoreadme : add BLOOM example (#151)
Nouamane Tazi [Sat, 13 May 2023 10:46:10 +0000 (12:46 +0200)]
readme : add BLOOM example (#151)

2 years agoexamples : update readme with new quantization usage + remove bug alert
Georgi Gerganov [Sat, 13 May 2023 10:08:56 +0000 (13:08 +0300)]
examples : update readme with new quantization usage + remove bug alert

2 years agoreadme : update example list (#146)
Georgi Gerganov [Sat, 13 May 2023 10:04:57 +0000 (13:04 +0300)]
readme : update example list (#146)

2 years agoexamples : add StarCoder/SantaCoder sample inference (#146)
Nouamane Tazi [Sat, 13 May 2023 09:54:03 +0000 (11:54 +0200)]
examples : add StarCoder/SantaCoder sample inference (#146)

* init commit

* fix building starcoder

* gen work

* fix vocab

* santacoder mha

* .

* fix quantize

* offload_state_dict

* endoftext

* rename scripts

* fix main

* scripts

* update README

* quickfixes

2 years agogpt-neox : add non-parallel residual support (#139)
Eldar Yusupov [Sat, 13 May 2023 09:41:45 +0000 (12:41 +0300)]
gpt-neox : add non-parallel residual support (#139)

* Add non-parallel residual support

* Rename stablelm to gpt-neox

* Fix stablelm model name

2 years agocommon : allow prompts to be loaded from file (#102)
Nevin [Sat, 13 May 2023 08:41:43 +0000 (10:41 +0200)]
common : allow prompts to be loaded from file (#102)

* common: allow prompts to be loaded from file

* common : extra help for -f

---------

Co-authored-by: Georgi Gerganov <redacted>
2 years agoggml : fix bug in alibi (#143)
yangyaofei [Thu, 11 May 2023 21:47:48 +0000 (05:47 +0800)]
ggml : fix bug in alibi (#143)

2 years agodolly-v2 : ggml_cgraph init (#112)
Georgi Gerganov [Mon, 8 May 2023 15:07:10 +0000 (18:07 +0300)]
dolly-v2 : ggml_cgraph init (#112)

2 years agoexamples : make struct initialization more portable (#112)
Tanmay Sachan [Mon, 8 May 2023 15:06:36 +0000 (20:36 +0530)]
examples : make struct initialization more portable (#112)

2 years agodolly-v2 : minor formatting
Georgi Gerganov [Mon, 8 May 2023 15:03:47 +0000 (18:03 +0300)]
dolly-v2 : minor formatting

2 years agoexamples : add dolly-v2 sample inference (#132)
Michael Verrilli [Sat, 6 May 2023 05:51:45 +0000 (01:51 -0400)]
examples : add dolly-v2 sample inference (#132)

* Vocab support for special tokens

* Initial dolly-v2 commit

* update README

2 years agostablelm : update README.md
Georgi Gerganov [Thu, 4 May 2023 15:45:39 +0000 (18:45 +0300)]
stablelm : update README.md

2 years agoggml : vectorize Q8_0 quantization (#127)
Georgi Gerganov [Wed, 3 May 2023 20:22:14 +0000 (23:22 +0300)]
ggml : vectorize Q8_0 quantization (#127)

2 years agoggml : fix 32-bit ARM
Georgi Gerganov [Tue, 2 May 2023 19:14:27 +0000 (22:14 +0300)]
ggml : fix 32-bit ARM

2 years agowhisper : sync with latest
Georgi Gerganov [Tue, 2 May 2023 18:28:21 +0000 (21:28 +0300)]
whisper : sync with latest

2 years agoscripts : update sync scripts
Georgi Gerganov [Tue, 2 May 2023 18:27:02 +0000 (21:27 +0300)]
scripts : update sync scripts

2 years agoggml : sync llama.cpp (clBLAST support + tensor names)
Georgi Gerganov [Tue, 2 May 2023 17:23:16 +0000 (20:23 +0300)]
ggml : sync llama.cpp (clBLAST support + tensor names)

2 years agoggml : temp comment
Georgi Gerganov [Mon, 1 May 2023 07:13:59 +0000 (10:13 +0300)]
ggml : temp comment

2 years agoggml : fix UB (int << 31)
Georgi Gerganov [Sun, 30 Apr 2023 19:28:14 +0000 (22:28 +0300)]
ggml : fix UB (int << 31)

2 years agoggml, whisper : sync whisper.cpp (GGML_FTYPE + Q5 WASM SIMD)
Georgi Gerganov [Sun, 30 Apr 2023 16:03:35 +0000 (19:03 +0300)]
ggml, whisper : sync whisper.cpp (GGML_FTYPE + Q5 WASM SIMD)

2 years agoggml : fix labels for GGML_OP_ALIBI
Georgi Gerganov [Sun, 30 Apr 2023 07:25:13 +0000 (10:25 +0300)]
ggml : fix labels for GGML_OP_ALIBI

2 years agoggml : fix 32-bit ARM NEON
Georgi Gerganov [Sat, 29 Apr 2023 18:33:59 +0000 (21:33 +0300)]
ggml : fix 32-bit ARM NEON

2 years agoggml : use vzip instead of vuzp for consistency
Georgi Gerganov [Sat, 29 Apr 2023 18:13:40 +0000 (21:13 +0300)]
ggml : use vzip instead of vuzp for consistency

2 years agoggml : fix SHARED build
Georgi Gerganov [Sat, 29 Apr 2023 16:13:53 +0000 (19:13 +0300)]
ggml : fix SHARED build

2 years agoggml : sync llama.cpp (less memory for mul_mat f16 + asserts)
Georgi Gerganov [Sat, 29 Apr 2023 16:07:19 +0000 (19:07 +0300)]
ggml : sync llama.cpp (less memory for mul_mat f16 + asserts)

2 years agoscripts : add sync-whisper.sh
Georgi Gerganov [Sat, 29 Apr 2023 09:33:57 +0000 (12:33 +0300)]
scripts : add sync-whisper.sh

2 years agocommon : forgot to remove Q4_3 references
Georgi Gerganov [Sat, 29 Apr 2023 07:30:56 +0000 (10:30 +0300)]
common : forgot to remove Q4_3 references

2 years agoggml : remove Q4_3
Georgi Gerganov [Sat, 29 Apr 2023 07:03:59 +0000 (10:03 +0300)]
ggml : remove Q4_3

2 years agoggml : ggml_alibi() fixes (#113)
Georgi Gerganov [Fri, 28 Apr 2023 17:47:27 +0000 (20:47 +0300)]
ggml : ggml_alibi() fixes (#113)

2 years agoggml : add ggml_alibi (positional embedding) (#113)
Dan Forbes [Fri, 28 Apr 2023 17:37:07 +0000 (10:37 -0700)]
ggml : add ggml_alibi (positional embedding) (#113)

Co-authored-by: @hhamud <redacted>
2 years agoggml : sync llama.cpp (CLBlast)
Georgi Gerganov [Fri, 28 Apr 2023 17:34:38 +0000 (20:34 +0300)]
ggml : sync llama.cpp (CLBlast)

2 years agogitignore : add python env folders
Georgi Gerganov [Fri, 28 Apr 2023 17:33:44 +0000 (20:33 +0300)]
gitignore : add python env folders

2 years agoreadme : add bert.cpp link (#114)
Santtu Keskinen [Fri, 28 Apr 2023 04:25:11 +0000 (07:25 +0300)]
readme : add bert.cpp link (#114)

2 years agostablelm : put warning about bug in the implementation
Georgi Gerganov [Thu, 27 Apr 2023 16:07:40 +0000 (19:07 +0300)]
stablelm : put warning about bug in the implementation

2 years agoggml : sync llama.cpp (Q5_0 + Q5_1) + refactor examples quantization
Georgi Gerganov [Thu, 27 Apr 2023 15:31:53 +0000 (18:31 +0300)]
ggml : sync llama.cpp (Q5_0 + Q5_1) + refactor examples quantization

2 years agoggml : sync llama.cpp (fix GCC 8 build, close #99)
Georgi Gerganov [Mon, 24 Apr 2023 15:52:25 +0000 (18:52 +0300)]
ggml : sync llama.cpp (fix GCC 8 build, close #99)

2 years agoggml : indentation
Georgi Gerganov [Sun, 23 Apr 2023 17:04:03 +0000 (20:04 +0300)]
ggml : indentation

2 years agoggml : add GGML_API for exporting shared symbols
Georgi Gerganov [Sun, 23 Apr 2023 16:57:37 +0000 (19:57 +0300)]
ggml : add GGML_API for exporting shared symbols

2 years agoggml : better PERF prints
Georgi Gerganov [Sun, 23 Apr 2023 16:45:39 +0000 (19:45 +0300)]
ggml : better PERF prints

2 years agotests : fix compile error (#98)
le.chang [Sun, 23 Apr 2023 16:12:49 +0000 (00:12 +0800)]
tests : fix compile error (#98)

2 years agogpt-2 : remove GPT-J unnecessary import (#91)
appvoid [Sun, 23 Apr 2023 16:11:33 +0000 (12:11 -0400)]
gpt-2 : remove GPT-J unnecessary import (#91)

2 years agotests : remove type cast (#100)
AsukaMinato [Sun, 23 Apr 2023 15:03:52 +0000 (00:03 +0900)]
tests : remove type cast (#100)

2 years agoggml : sync llama.cpp (AVX improvements)
Georgi Gerganov [Sun, 23 Apr 2023 13:38:00 +0000 (16:38 +0300)]
ggml : sync llama.cpp (AVX improvements)

2 years agoggml : fix Q4_3 cuBLAS + fix quantize_row_q4_2()
Georgi Gerganov [Sat, 22 Apr 2023 13:34:39 +0000 (16:34 +0300)]
ggml : fix Q4_3 cuBLAS + fix quantize_row_q4_2()

2 years agoexamples : refactor quantization tools
Georgi Gerganov [Sat, 22 Apr 2023 12:49:15 +0000 (15:49 +0300)]
examples : refactor quantization tools

2 years agoexamples : utils -> common
Georgi Gerganov [Sat, 22 Apr 2023 11:59:42 +0000 (14:59 +0300)]
examples : utils -> common

2 years agoggml : fix ARM build
Georgi Gerganov [Sat, 22 Apr 2023 10:59:49 +0000 (13:59 +0300)]
ggml : fix ARM build

2 years agocmake : add CMake support for cuBLAS (#101)
Georgi Gerganov [Sat, 22 Apr 2023 10:23:20 +0000 (13:23 +0300)]
cmake : add CMake support for cuBLAS (#101)

* cmake : add cuBLAS support

* cmake : fix cuBLAS build

2 years agoexamples : add Q4_2 and Q4_3 quantization support
Georgi Gerganov [Sat, 22 Apr 2023 09:52:25 +0000 (12:52 +0300)]
examples : add Q4_2 and Q4_3 quantization support

2 years agoggml : sync llama.cpp (Q4_3 + CUDA)
Georgi Gerganov [Sat, 22 Apr 2023 09:36:42 +0000 (12:36 +0300)]
ggml : sync llama.cpp (Q4_3 + CUDA)

2 years agomnist : add missing header (#95)
Bart Pelle [Thu, 20 Apr 2023 21:15:45 +0000 (23:15 +0200)]
mnist : add missing header (#95)

2 years agostablelm : update README.md
Georgi Gerganov [Thu, 20 Apr 2023 20:35:52 +0000 (23:35 +0300)]
stablelm : update README.md

2 years agominor : fix GPT-NeoX name
Georgi Gerganov [Thu, 20 Apr 2023 20:23:07 +0000 (23:23 +0300)]
minor : fix GPT-NeoX name

2 years agoreadme : add StableLM reference
Georgi Gerganov [Thu, 20 Apr 2023 20:21:38 +0000 (23:21 +0300)]
readme : add StableLM reference

2 years agoexamples : add StableLM example (#96)
Georgi Gerganov [Thu, 20 Apr 2023 20:20:38 +0000 (23:20 +0300)]
examples : add StableLM example (#96)

* ggml : there is a bug in ggml_cpy() F32 -> F32

Cannot see why, but multi-thread does not work

* stablelm : initial implementation, but QKV seems broken

* stablelm : make it work

* stablelm : use original merged QKV matrix

* stablelm : minor

* stablelm : instructions

* stablelm : update README.md

2 years agoggml : sync llama.cpp (cuBLAS, Q4_3, bug fix, etc)
Georgi Gerganov [Thu, 20 Apr 2023 19:00:49 +0000 (22:00 +0300)]
ggml : sync llama.cpp (cuBLAS, Q4_3, bug fix, etc)