]>
git.djapps.eu Git - pkg/ggml/sources/ggml/log
Georgi Gerganov [Wed, 24 May 2023 07:41:06 +0000 (10:41 +0300)]
examples : remove prompt pipe-in support
Need cross-platform solution, factored out in common
Georgi Gerganov [Wed, 24 May 2023 07:40:27 +0000 (10:40 +0300)]
common : add missing declarations
klosax [Wed, 24 May 2023 07:27:36 +0000 (09:27 +0200)]
mpt : utf-8 support, perplexity testing, repeat penalty sampling (#184)
* common: utf-8 decoder, reverted gpt_toeknize utf-8 convert
* Update common.h
* main: decode utf-8 tokens on load
* mpt import: bug fix
* common: style fixes
* common: style fix
* Update common.h
* common: revert gpt_tokenize utf-8 convert
* Update common.cpp
* Update common.cpp
* Update common.cpp
* Add perplexity to mpt
* Update CMakeLists: perplexity
* mpt-perplexity: fixes
* Update perplexity.cpp
* common: add sampling with repeat penalty
* mpt-main: add repeat penalty sampling, add commandline parameters
* Update common.h
* mpt-main: style fixes
* Update perplexity.cpp
* Delete perplexity.cpp
* mpt: move perplexity to main
* mpt: move perplexity to main
* common.cpp: Use codecvt utf-8 converter
* main.cpp: Use codecvt utf-8 converter
* mpt : code style changes
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Mon, 22 May 2023 14:57:21 +0000 (17:57 +0300)]
readme : update Features
Ravindra Marella [Sun, 21 May 2023 12:32:05 +0000 (18:02 +0530)]
readme : add link to python bindings (#181)
klosax [Sun, 21 May 2023 08:21:51 +0000 (10:21 +0200)]
common : support utf-8 + fix gpt_tokenize + fix MPT model import (#179)
* Update convert-h5-to-ggml.py
* Import tokens correctly
* gpt_tokenize: Convert input to utf-8 + bug fix
* common : minor style fixes
---------
Co-authored-by: Georgi Gerganov <redacted>
Dan Forbes [Sat, 20 May 2023 18:25:25 +0000 (11:25 -0700)]
readme : add link to GGML format docs (#177)
Georgi Gerganov [Sat, 20 May 2023 17:56:35 +0000 (20:56 +0300)]
examples : use scratch buffers to reduce memory usage (#176)
* starcoder : example for using scratch buffers to reduce memory usage
* starcoder : bump scratch buffers to 256 MB
* examples : add scratch buffers to MPT and GPT-NeoX
Georgi Gerganov [Sat, 20 May 2023 17:00:27 +0000 (20:00 +0300)]
ggml : update WASM SIMD
Georgi Gerganov [Sat, 20 May 2023 15:59:04 +0000 (18:59 +0300)]
whisper : fix Hebrew lang id
Georgi Gerganov [Sat, 20 May 2023 15:01:40 +0000 (18:01 +0300)]
examples : add quantize version to MPT and Replit examples (ref #168)
Georgi Gerganov [Sat, 20 May 2023 14:45:49 +0000 (17:45 +0300)]
common : force --top_k to be at least 1
Georgi Gerganov [Sat, 20 May 2023 14:33:07 +0000 (17:33 +0300)]
examples : fix vocab loading (close #163)
Georgi Gerganov [Sat, 20 May 2023 14:22:58 +0000 (17:22 +0300)]
common : fix gpt_tokenize (ref #170)
Michael Verrilli [Sat, 20 May 2023 14:12:24 +0000 (10:12 -0400)]
dolly-v2 : par_res and neox changes (#167)
* dolly-v2 example: par_res and neox changes
* Update examples/dolly-v2/quantize.cpp
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Sat, 20 May 2023 14:09:41 +0000 (17:09 +0300)]
examples : call ggml_time_init() (close #166)
Georgi Gerganov [Sat, 20 May 2023 13:48:03 +0000 (16:48 +0300)]
Update README.md
Georgi Gerganov [Sat, 20 May 2023 12:59:34 +0000 (15:59 +0300)]
ggml : sync llama.cpp - CUDA improvements + ggml minor fixes
Georgi Gerganov [Sat, 20 May 2023 11:56:14 +0000 (14:56 +0300)]
ggml : sync llama.cpp - new quantization formats Q4 + Q8
pikalover6 [Thu, 18 May 2023 06:52:22 +0000 (23:52 -0700)]
readme : update roadmap (#164)
+ MPT & Replit
Lukas Möller [Wed, 17 May 2023 19:58:21 +0000 (21:58 +0200)]
examples : sample replit + MPT inference (#145)
* Add replit model
* Add unigram tokenization support
* Remove debug log
* Port alibi attn bias fix
* Remove torch input
* Fix hardcoded path
* Remove unsupported hyperparams
* Add mpt
* Add replit quantization script
* Remove debug print
* Add quantization support to mpt
* Reformat
* Remove trailing return type
* Implement stylistic changes
* use f16 in k/v memory calculations for replit/mpt
* Update context size calculation
* Add clip_qkv and alibi_bias_max support
* fix clamping implementation, remove implicit conversions
* Fix qkv if condition
* Fix replit context size calculation
* Potentially fix gcc compilation error
* Fix warning
* Adjust object overhead
* Remove dead code
jaeminSon [Wed, 17 May 2023 15:49:37 +0000 (00:49 +0900)]
examples : fix a hyperparameter value in gpt-neox (#161) (#162)
Andrei [Wed, 17 May 2023 06:27:11 +0000 (02:27 -0400)]
ggml : fix typo in ggml_diag_mask_zero_inplace() (#159)
Georgi Gerganov [Mon, 15 May 2023 04:50:54 +0000 (07:50 +0300)]
readme : add link to training example
Georgi Gerganov [Sun, 14 May 2023 15:56:18 +0000 (18:56 +0300)]
ggml : add AVX dot products
Georgi Gerganov [Sun, 14 May 2023 15:55:29 +0000 (18:55 +0300)]
whisper : sync whisper.cpp
IGUILIZ Salah-Eddine [Sun, 14 May 2023 15:31:08 +0000 (17:31 +0200)]
starcoder : detect santacoder fix end of text token (#155)
Co-authored-by: IGUILIZ Salah-Eddine <redacted>
Georgi Gerganov [Sun, 14 May 2023 14:26:41 +0000 (17:26 +0300)]
readme : add re-quantization warning
Georgi Gerganov [Sun, 14 May 2023 12:10:32 +0000 (15:10 +0300)]
examples : use inplace calls explicitly
Georgi Gerganov [Sun, 14 May 2023 11:55:28 +0000 (14:55 +0300)]
tests : add tests from llama.cpp
Georgi Gerganov [Sun, 14 May 2023 11:45:13 +0000 (14:45 +0300)]
ggml : fix multi-threaded ggml_compute_forward_diag_mask_f32()
Georgi Gerganov [Sun, 14 May 2023 11:16:47 +0000 (14:16 +0300)]
ggml : fix rope calculation (!inplace + GPT-NeoX mode)
Georgi Gerganov [Sun, 14 May 2023 08:23:02 +0000 (11:23 +0300)]
ggml : new Q4 and Q5 quantization formats + backward ops
sync llama.cpp
- bump GGML_QNT_VERSION -> 1
- increase cwggml object overhead size from 256 to 512 in examples
- drop Q4_2 support
- tensor backend support CUDA
Georgi Gerganov [Sun, 14 May 2023 07:07:27 +0000 (10:07 +0300)]
ggml : add GGML_QNT_VERSION for tracking changes to the quantization format
ref #150
Georgi Gerganov [Sun, 14 May 2023 07:06:19 +0000 (10:06 +0300)]
whisper : sync whisper.cpp minor changes
Ravindra Marella [Sat, 13 May 2023 13:47:02 +0000 (19:17 +0530)]
starcoder : update example to follow the naming convention of other examples (#153)
Georgi Gerganov [Sat, 13 May 2023 13:02:49 +0000 (16:02 +0300)]
readme : fix gpt-neox example link
Ravindra Marella [Sat, 13 May 2023 12:24:47 +0000 (17:54 +0530)]
examples : fix warnings (#152)
Nouamane Tazi [Sat, 13 May 2023 10:46:10 +0000 (12:46 +0200)]
readme : add BLOOM example (#151)
Georgi Gerganov [Sat, 13 May 2023 10:08:56 +0000 (13:08 +0300)]
examples : update readme with new quantization usage + remove bug alert
Georgi Gerganov [Sat, 13 May 2023 10:04:57 +0000 (13:04 +0300)]
readme : update example list (#146)
Nouamane Tazi [Sat, 13 May 2023 09:54:03 +0000 (11:54 +0200)]
examples : add StarCoder/SantaCoder sample inference (#146)
* init commit
* fix building starcoder
* gen work
* fix vocab
* santacoder mha
* .
* fix quantize
* offload_state_dict
* endoftext
* rename scripts
* fix main
* scripts
* update README
* quickfixes
Eldar Yusupov [Sat, 13 May 2023 09:41:45 +0000 (12:41 +0300)]
gpt-neox : add non-parallel residual support (#139)
* Add non-parallel residual support
* Rename stablelm to gpt-neox
* Fix stablelm model name
Nevin [Sat, 13 May 2023 08:41:43 +0000 (10:41 +0200)]
common : allow prompts to be loaded from file (#102)
* common: allow prompts to be loaded from file
* common : extra help for -f
---------
Co-authored-by: Georgi Gerganov <redacted>
yangyaofei [Thu, 11 May 2023 21:47:48 +0000 (05:47 +0800)]
ggml : fix bug in alibi (#143)
Georgi Gerganov [Mon, 8 May 2023 15:07:10 +0000 (18:07 +0300)]
dolly-v2 : ggml_cgraph init (#112)
Tanmay Sachan [Mon, 8 May 2023 15:06:36 +0000 (20:36 +0530)]
examples : make struct initialization more portable (#112)
Georgi Gerganov [Mon, 8 May 2023 15:03:47 +0000 (18:03 +0300)]
dolly-v2 : minor formatting
Michael Verrilli [Sat, 6 May 2023 05:51:45 +0000 (01:51 -0400)]
examples : add dolly-v2 sample inference (#132)
* Vocab support for special tokens
* Initial dolly-v2 commit
* update README
Georgi Gerganov [Thu, 4 May 2023 15:45:39 +0000 (18:45 +0300)]
stablelm : update README.md
Georgi Gerganov [Wed, 3 May 2023 20:22:14 +0000 (23:22 +0300)]
ggml : vectorize Q8_0 quantization (#127)
Georgi Gerganov [Tue, 2 May 2023 19:14:27 +0000 (22:14 +0300)]
ggml : fix 32-bit ARM
Georgi Gerganov [Tue, 2 May 2023 18:28:21 +0000 (21:28 +0300)]
whisper : sync with latest
Georgi Gerganov [Tue, 2 May 2023 18:27:02 +0000 (21:27 +0300)]
scripts : update sync scripts
Georgi Gerganov [Tue, 2 May 2023 17:23:16 +0000 (20:23 +0300)]
ggml : sync llama.cpp (clBLAST support + tensor names)
Georgi Gerganov [Mon, 1 May 2023 07:13:59 +0000 (10:13 +0300)]
ggml : temp comment
Georgi Gerganov [Sun, 30 Apr 2023 19:28:14 +0000 (22:28 +0300)]
ggml : fix UB (int << 31)
Georgi Gerganov [Sun, 30 Apr 2023 16:03:35 +0000 (19:03 +0300)]
ggml, whisper : sync whisper.cpp (GGML_FTYPE + Q5 WASM SIMD)
Georgi Gerganov [Sun, 30 Apr 2023 07:25:13 +0000 (10:25 +0300)]
ggml : fix labels for GGML_OP_ALIBI
Georgi Gerganov [Sat, 29 Apr 2023 18:33:59 +0000 (21:33 +0300)]
ggml : fix 32-bit ARM NEON
Georgi Gerganov [Sat, 29 Apr 2023 18:13:40 +0000 (21:13 +0300)]
ggml : use vzip instead of vuzp for consistency
Georgi Gerganov [Sat, 29 Apr 2023 16:13:53 +0000 (19:13 +0300)]
ggml : fix SHARED build
Georgi Gerganov [Sat, 29 Apr 2023 16:07:19 +0000 (19:07 +0300)]
ggml : sync llama.cpp (less memory for mul_mat f16 + asserts)
Georgi Gerganov [Sat, 29 Apr 2023 09:33:57 +0000 (12:33 +0300)]
scripts : add sync-whisper.sh
Georgi Gerganov [Sat, 29 Apr 2023 07:30:56 +0000 (10:30 +0300)]
common : forgot to remove Q4_3 references
Georgi Gerganov [Sat, 29 Apr 2023 07:03:59 +0000 (10:03 +0300)]
ggml : remove Q4_3
Georgi Gerganov [Fri, 28 Apr 2023 17:47:27 +0000 (20:47 +0300)]
ggml : ggml_alibi() fixes (#113)
Dan Forbes [Fri, 28 Apr 2023 17:37:07 +0000 (10:37 -0700)]
ggml : add ggml_alibi (positional embedding) (#113)
Co-authored-by: @hhamud <redacted>
Georgi Gerganov [Fri, 28 Apr 2023 17:34:38 +0000 (20:34 +0300)]
ggml : sync llama.cpp (CLBlast)
Georgi Gerganov [Fri, 28 Apr 2023 17:33:44 +0000 (20:33 +0300)]
gitignore : add python env folders
Santtu Keskinen [Fri, 28 Apr 2023 04:25:11 +0000 (07:25 +0300)]
readme : add bert.cpp link (#114)
Georgi Gerganov [Thu, 27 Apr 2023 16:07:40 +0000 (19:07 +0300)]
stablelm : put warning about bug in the implementation
Georgi Gerganov [Thu, 27 Apr 2023 15:31:53 +0000 (18:31 +0300)]
ggml : sync llama.cpp (Q5_0 + Q5_1) + refactor examples quantization
Georgi Gerganov [Mon, 24 Apr 2023 15:52:25 +0000 (18:52 +0300)]
ggml : sync llama.cpp (fix GCC 8 build, close #99)
Georgi Gerganov [Sun, 23 Apr 2023 17:04:03 +0000 (20:04 +0300)]
ggml : indentation
Georgi Gerganov [Sun, 23 Apr 2023 16:57:37 +0000 (19:57 +0300)]
ggml : add GGML_API for exporting shared symbols
Georgi Gerganov [Sun, 23 Apr 2023 16:45:39 +0000 (19:45 +0300)]
ggml : better PERF prints
le.chang [Sun, 23 Apr 2023 16:12:49 +0000 (00:12 +0800)]
tests : fix compile error (#98)
appvoid [Sun, 23 Apr 2023 16:11:33 +0000 (12:11 -0400)]
gpt-2 : remove GPT-J unnecessary import (#91)
AsukaMinato [Sun, 23 Apr 2023 15:03:52 +0000 (00:03 +0900)]
tests : remove type cast (#100)
Georgi Gerganov [Sun, 23 Apr 2023 13:38:00 +0000 (16:38 +0300)]
ggml : sync llama.cpp (AVX improvements)
Georgi Gerganov [Sat, 22 Apr 2023 13:34:39 +0000 (16:34 +0300)]
ggml : fix Q4_3 cuBLAS + fix quantize_row_q4_2()
Georgi Gerganov [Sat, 22 Apr 2023 12:49:15 +0000 (15:49 +0300)]
examples : refactor quantization tools
Georgi Gerganov [Sat, 22 Apr 2023 11:59:42 +0000 (14:59 +0300)]
examples : utils -> common
Georgi Gerganov [Sat, 22 Apr 2023 10:59:49 +0000 (13:59 +0300)]
ggml : fix ARM build
Georgi Gerganov [Sat, 22 Apr 2023 10:23:20 +0000 (13:23 +0300)]
cmake : add CMake support for cuBLAS (#101)
* cmake : add cuBLAS support
* cmake : fix cuBLAS build
Georgi Gerganov [Sat, 22 Apr 2023 09:52:25 +0000 (12:52 +0300)]
examples : add Q4_2 and Q4_3 quantization support
Georgi Gerganov [Sat, 22 Apr 2023 09:36:42 +0000 (12:36 +0300)]
ggml : sync llama.cpp (Q4_3 + CUDA)
Bart Pelle [Thu, 20 Apr 2023 21:15:45 +0000 (23:15 +0200)]
mnist : add missing header (#95)
Georgi Gerganov [Thu, 20 Apr 2023 20:35:52 +0000 (23:35 +0300)]
stablelm : update README.md
Georgi Gerganov [Thu, 20 Apr 2023 20:23:07 +0000 (23:23 +0300)]
minor : fix GPT-NeoX name
Georgi Gerganov [Thu, 20 Apr 2023 20:21:38 +0000 (23:21 +0300)]
readme : add StableLM reference
Georgi Gerganov [Thu, 20 Apr 2023 20:20:38 +0000 (23:20 +0300)]
examples : add StableLM example (#96)
* ggml : there is a bug in ggml_cpy() F32 -> F32
Cannot see why, but multi-thread does not work
* stablelm : initial implementation, but QKV seems broken
* stablelm : make it work
* stablelm : use original merged QKV matrix
* stablelm : minor
* stablelm : instructions
* stablelm : update README.md
Georgi Gerganov [Thu, 20 Apr 2023 19:00:49 +0000 (22:00 +0300)]
ggml : sync llama.cpp (cuBLAS, Q4_3, bug fix, etc)
Georgi Gerganov [Wed, 19 Apr 2023 17:20:23 +0000 (20:20 +0300)]
ggml : sync llama.cpp
Georgi Gerganov [Sat, 15 Apr 2023 19:23:10 +0000 (22:23 +0300)]
examples : update huggingface links
Georgi Gerganov [Sat, 15 Apr 2023 16:50:54 +0000 (19:50 +0300)]
ggml : sync llama.cpp
Georgi Gerganov [Sat, 15 Apr 2023 11:25:34 +0000 (14:25 +0300)]
ggml : add ggml_type_name()
Georgi Gerganov [Sat, 15 Apr 2023 11:23:26 +0000 (14:23 +0300)]
ggml : use posix_memalign on non-Windows env
Georgi Gerganov [Fri, 14 Apr 2023 14:45:54 +0000 (17:45 +0300)]
ggml : add unary and binary map operations