]>
git.djapps.eu Git - pkg/ggml/sources/ggml/log
Georgi Gerganov [Tue, 2 May 2023 17:23:16 +0000 (20:23 +0300)]
ggml : sync llama.cpp (clBLAST support + tensor names)
Georgi Gerganov [Mon, 1 May 2023 07:13:59 +0000 (10:13 +0300)]
ggml : temp comment
Georgi Gerganov [Sun, 30 Apr 2023 19:28:14 +0000 (22:28 +0300)]
ggml : fix UB (int << 31)
Georgi Gerganov [Sun, 30 Apr 2023 16:03:35 +0000 (19:03 +0300)]
ggml, whisper : sync whisper.cpp (GGML_FTYPE + Q5 WASM SIMD)
Georgi Gerganov [Sun, 30 Apr 2023 07:25:13 +0000 (10:25 +0300)]
ggml : fix labels for GGML_OP_ALIBI
Georgi Gerganov [Sat, 29 Apr 2023 18:33:59 +0000 (21:33 +0300)]
ggml : fix 32-bit ARM NEON
Georgi Gerganov [Sat, 29 Apr 2023 18:13:40 +0000 (21:13 +0300)]
ggml : use vzip instead of vuzp for consistency
Georgi Gerganov [Sat, 29 Apr 2023 16:13:53 +0000 (19:13 +0300)]
ggml : fix SHARED build
Georgi Gerganov [Sat, 29 Apr 2023 16:07:19 +0000 (19:07 +0300)]
ggml : sync llama.cpp (less memory for mul_mat f16 + asserts)
Georgi Gerganov [Sat, 29 Apr 2023 09:33:57 +0000 (12:33 +0300)]
scripts : add sync-whisper.sh
Georgi Gerganov [Sat, 29 Apr 2023 07:30:56 +0000 (10:30 +0300)]
common : forgot to remove Q4_3 references
Georgi Gerganov [Sat, 29 Apr 2023 07:03:59 +0000 (10:03 +0300)]
ggml : remove Q4_3
Georgi Gerganov [Fri, 28 Apr 2023 17:47:27 +0000 (20:47 +0300)]
ggml : ggml_alibi() fixes (#113)
Dan Forbes [Fri, 28 Apr 2023 17:37:07 +0000 (10:37 -0700)]
ggml : add ggml_alibi (positional embedding) (#113)
Co-authored-by: @hhamud <redacted>
Georgi Gerganov [Fri, 28 Apr 2023 17:34:38 +0000 (20:34 +0300)]
ggml : sync llama.cpp (CLBlast)
Georgi Gerganov [Fri, 28 Apr 2023 17:33:44 +0000 (20:33 +0300)]
gitignore : add python env folders
Santtu Keskinen [Fri, 28 Apr 2023 04:25:11 +0000 (07:25 +0300)]
readme : add bert.cpp link (#114)
Georgi Gerganov [Thu, 27 Apr 2023 16:07:40 +0000 (19:07 +0300)]
stablelm : put warning about bug in the implementation
Georgi Gerganov [Thu, 27 Apr 2023 15:31:53 +0000 (18:31 +0300)]
ggml : sync llama.cpp (Q5_0 + Q5_1) + refactor examples quantization
Georgi Gerganov [Mon, 24 Apr 2023 15:52:25 +0000 (18:52 +0300)]
ggml : sync llama.cpp (fix GCC 8 build, close #99)
Georgi Gerganov [Sun, 23 Apr 2023 17:04:03 +0000 (20:04 +0300)]
ggml : indentation
Georgi Gerganov [Sun, 23 Apr 2023 16:57:37 +0000 (19:57 +0300)]
ggml : add GGML_API for exporting shared symbols
Georgi Gerganov [Sun, 23 Apr 2023 16:45:39 +0000 (19:45 +0300)]
ggml : better PERF prints
le.chang [Sun, 23 Apr 2023 16:12:49 +0000 (00:12 +0800)]
tests : fix compile error (#98)
appvoid [Sun, 23 Apr 2023 16:11:33 +0000 (12:11 -0400)]
gpt-2 : remove GPT-J unnecessary import (#91)
AsukaMinato [Sun, 23 Apr 2023 15:03:52 +0000 (00:03 +0900)]
tests : remove type cast (#100)
Georgi Gerganov [Sun, 23 Apr 2023 13:38:00 +0000 (16:38 +0300)]
ggml : sync llama.cpp (AVX improvements)
Georgi Gerganov [Sat, 22 Apr 2023 13:34:39 +0000 (16:34 +0300)]
ggml : fix Q4_3 cuBLAS + fix quantize_row_q4_2()
Georgi Gerganov [Sat, 22 Apr 2023 12:49:15 +0000 (15:49 +0300)]
examples : refactor quantization tools
Georgi Gerganov [Sat, 22 Apr 2023 11:59:42 +0000 (14:59 +0300)]
examples : utils -> common
Georgi Gerganov [Sat, 22 Apr 2023 10:59:49 +0000 (13:59 +0300)]
ggml : fix ARM build
Georgi Gerganov [Sat, 22 Apr 2023 10:23:20 +0000 (13:23 +0300)]
cmake : add CMake support for cuBLAS (#101)
* cmake : add cuBLAS support
* cmake : fix cuBLAS build
Georgi Gerganov [Sat, 22 Apr 2023 09:52:25 +0000 (12:52 +0300)]
examples : add Q4_2 and Q4_3 quantization support
Georgi Gerganov [Sat, 22 Apr 2023 09:36:42 +0000 (12:36 +0300)]
ggml : sync llama.cpp (Q4_3 + CUDA)
Bart Pelle [Thu, 20 Apr 2023 21:15:45 +0000 (23:15 +0200)]
mnist : add missing header (#95)
Georgi Gerganov [Thu, 20 Apr 2023 20:35:52 +0000 (23:35 +0300)]
stablelm : update README.md
Georgi Gerganov [Thu, 20 Apr 2023 20:23:07 +0000 (23:23 +0300)]
minor : fix GPT-NeoX name
Georgi Gerganov [Thu, 20 Apr 2023 20:21:38 +0000 (23:21 +0300)]
readme : add StableLM reference
Georgi Gerganov [Thu, 20 Apr 2023 20:20:38 +0000 (23:20 +0300)]
examples : add StableLM example (#96)
* ggml : there is a bug in ggml_cpy() F32 -> F32
Cannot see why, but multi-thread does not work
* stablelm : initial implementation, but QKV seems broken
* stablelm : make it work
* stablelm : use original merged QKV matrix
* stablelm : minor
* stablelm : instructions
* stablelm : update README.md
Georgi Gerganov [Thu, 20 Apr 2023 19:00:49 +0000 (22:00 +0300)]
ggml : sync llama.cpp (cuBLAS, Q4_3, bug fix, etc)
Georgi Gerganov [Wed, 19 Apr 2023 17:20:23 +0000 (20:20 +0300)]
ggml : sync llama.cpp
Georgi Gerganov [Sat, 15 Apr 2023 19:23:10 +0000 (22:23 +0300)]
examples : update huggingface links
Georgi Gerganov [Sat, 15 Apr 2023 16:50:54 +0000 (19:50 +0300)]
ggml : sync llama.cpp
Georgi Gerganov [Sat, 15 Apr 2023 11:25:34 +0000 (14:25 +0300)]
ggml : add ggml_type_name()
Georgi Gerganov [Sat, 15 Apr 2023 11:23:26 +0000 (14:23 +0300)]
ggml : use posix_memalign on non-Windows env
Georgi Gerganov [Fri, 14 Apr 2023 14:45:54 +0000 (17:45 +0300)]
ggml : add unary and binary map operations
Georgi Gerganov [Fri, 14 Apr 2023 10:32:27 +0000 (13:32 +0300)]
ggml : avoid powf() calls in ggml_rope()
Georgi Gerganov [Fri, 14 Apr 2023 10:32:12 +0000 (13:32 +0300)]
ggml : fix ARM NEON dot product types
Georgi Gerganov [Thu, 13 Apr 2023 21:02:31 +0000 (00:02 +0300)]
mnist : update README
Georgi Gerganov [Thu, 13 Apr 2023 21:00:42 +0000 (00:00 +0300)]
mnist : minor fixes and adjustments
Ray Cromwell [Thu, 13 Apr 2023 20:49:45 +0000 (13:49 -0700)]
examples : MNIST example for ggml (#84)
Georgi Gerganov [Thu, 13 Apr 2023 15:37:19 +0000 (18:37 +0300)]
ggml : sync latest changes from llama.cpp
Jakob Frick [Thu, 13 Apr 2023 12:41:53 +0000 (14:41 +0200)]
gpt-2 : typo fix for the Cerebras instructions (#57)
Georgi Gerganov [Thu, 13 Apr 2023 12:40:33 +0000 (15:40 +0300)]
ggml : add GGML_DEFAULT_N_THREADS
LostRuins [Thu, 13 Apr 2023 12:27:56 +0000 (20:27 +0800)]
gpt : fix pytorch converter text encodings (#78)
* Fixed quantization for f16 models not working - this is because the f16 tables were not initialized thus f16 to f32 conversion was failing.
* On some situations, the script fails with the error : UnicodeDecodeError: 'charmap' codec can't decode byte (byte) in position (number) : character maps to <undefined>
This is probably because the encodings are incorrect.
Explicitly specifying them as UTF-8 seems to resolve the issue and allow for correct conversion.
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Wed, 12 Apr 2023 15:59:41 +0000 (18:59 +0300)]
readme : update roadmap
Georgi Gerganov [Tue, 11 Apr 2023 18:33:17 +0000 (21:33 +0300)]
gpt-j : update inference to match latest llama.cpp insights
- Use F16 KV cache
- Store transposed V in the cache
- Avoid unnecessary Q copy
Georgi Gerganov [Mon, 10 Apr 2023 20:21:11 +0000 (23:21 +0300)]
ggml : fix <windows.h> include
Georgi Gerganov [Mon, 10 Apr 2023 20:19:15 +0000 (23:19 +0300)]
ggml : fix WASM build
Georgi Gerganov [Mon, 10 Apr 2023 19:39:24 +0000 (22:39 +0300)]
whisper : sync with whisper.cpp
Georgi Gerganov [Mon, 10 Apr 2023 19:39:07 +0000 (22:39 +0300)]
ggml : optimize ggml_cpy() for contiguous dst
Georgi Gerganov [Mon, 10 Apr 2023 16:36:06 +0000 (19:36 +0300)]
ggml : sync with llama.cpp
- int64_t number of elements
- remove mlock
- expose quantization functions
- expose ggml_object
- add ggml_view_3d()
- multi-thread ggml_rope()
- fix ggml_cpy()
- add ggml_init_params.no_alloc
- fix ggml_mul_mat() backward
LostRuins [Mon, 10 Apr 2023 07:47:47 +0000 (15:47 +0800)]
gpt : initialize f16 tables during quantization (#77)
Georgi Gerganov [Fri, 7 Apr 2023 18:21:33 +0000 (21:21 +0300)]
readme : update Roadmap (add rwkv.cpp)
Georgi Gerganov [Thu, 30 Mar 2023 21:37:37 +0000 (00:37 +0300)]
gpt-2 : minor update readme
Georgi Gerganov [Thu, 30 Mar 2023 21:34:14 +0000 (00:34 +0300)]
gpt-2 : fix qunatize tool to quantize the "lm_head" tensor
Georgi Gerganov [Thu, 30 Mar 2023 20:39:15 +0000 (23:39 +0300)]
gpt-2 : add Cerebras-GPT example
Supreet Sethi [Thu, 30 Mar 2023 17:25:29 +0000 (01:25 +0800)]
ggml : fix NEON sign types (#51)
Cordeiro [Wed, 29 Mar 2023 20:39:27 +0000 (15:39 -0500)]
gpt-2 : convert h5 to ggml (#35)
* Script to convert h5 to ggml adapted from gpt-j example
* Fix map tensors
* optimize
* rename headers to keep compatibility
* revert gpt-2/main.cpp
---------
Co-authored-by: Alan <redacted>
Co-authored-by: Alan <redacted>
Co-authored-by: ocordeiro <redacted>
Georgi Gerganov [Wed, 29 Mar 2023 19:23:14 +0000 (22:23 +0300)]
readme : update Roadmap
Georgi Gerganov [Wed, 29 Mar 2023 19:21:36 +0000 (22:21 +0300)]
ggml : 4-bit Integer quantisation + many llama.cpp improvements (#27)
* gq : attempt at n-bit quantization
* gq : add amax based method 3
* gq : progress on method 2
* gq : method 4 (AVX2)
* gq : method 4 (ARM)
* gq : method 4 (AVX2 attempt) + method 5 (no min)
* gq : method 5 (ARM)
* gpt-2 : model conversion for Q4_0 quantization
* ggml : Q4_0 quantization support (ggml_get_rows())
* gpt-2 : loading Q4_0 quantized model
* ggml : q4_0 quantization support
* ggml : q4_1 quantization support (seems to work for bigger models)
* gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
* ggml : 4-bit quantization works (only scalar for now)
* gq : add method 6 (ARM)
* ggml : vectorized mad q4_0 (ARM)
* ggml : vectorized quantize_row_q4_0 (ARM)
* ggml : simplify mad q4_0 (ARM)
* ggml : minor indentations
* gpt-j : support for 4-bit quantized model inference
* ggml : GGML_ASSERT() instead of assert() where appropriate
* gpt : avoid ggml_transpose on model tensors (new models!)
* gpt-2 : minor
* gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
* ggml : add ggml_compute_forward_rope_f16()
* gpt : fix memory usage computation
* ggml : fix ggml_is_contiguous() to take into account blck size
* whisper : add whisper-qunatize tool
* whisper : add support for quantized models
* whisper : mem usage based on model format type
* gpt : seems not worth to use FP16 for KV cache
* gpt : support quantisation of f16 models files
* ggml : fixes for rpi4
* whisper : add Q4_1 model sizes
* ggml : add WASM SIMD for Q4_0
* utils : print quantization histograms
* ggml : sync all changes from llama.cpp and whisper.cpp
* ggml : finalize the Q4_1 quantization for ARM_NEON
MaiHD [Sat, 25 Mar 2023 20:43:24 +0000 (03:43 +0700)]
ggml : make it work on Windows (#46)
Georgi Gerganov [Sat, 25 Mar 2023 14:32:48 +0000 (16:32 +0200)]
tests : add test-blas0
Georgi Gerganov [Wed, 22 Mar 2023 19:52:32 +0000 (21:52 +0200)]
Fix CMake indentation
katsu560 [Wed, 22 Mar 2023 19:51:47 +0000 (04:51 +0900)]
add OpenBLAS detection and modify tests codes (#40)
* fix indents and commands for Haiku, and add OpenBLAS detection in src/CMakeLists.txt
* add system detection and add OpenBLAS detection
* change loop number by environment variable GGML_NLOOP or command line option
* change fmadd codes on no FMA support system
* change n_threads by environment variable GGML_NTHREADS or command line option
---------
Co-authored-by: Georgi Gerganov <redacted>
Alex von Gluck IV [Wed, 22 Mar 2023 19:43:58 +0000 (14:43 -0500)]
CMakeLists: Fix Haiku CPU detection (#39)
hidenorly [Wed, 22 Mar 2023 19:43:22 +0000 (04:43 +0900)]
Add pipe input for prompt on gpt examples (#38)
Enable prompt input through pipe, instead of using -p option.
This makes easier to give longer and multiple lines for the prompt.
Test:
$ echo "This is an example" > prompt.txt
$ cat prompt.txt | ./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin
$ cat promot.txt | ./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin
Note that -p option and no -p specified case are kept.
$ ./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"
$ ./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin
$ ./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
$ ./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin
katsu560 [Mon, 6 Mar 2023 17:52:16 +0000 (02:52 +0900)]
cmake : update CMakeLists.txt to add correct flags (#26)
* modify src/CMakeLists.txt from whisper.cpp
* cmake : remove OpenBLAS stuff
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Mon, 6 Mar 2023 05:40:55 +0000 (07:40 +0200)]
readme : update Roadmap
Georgi Gerganov [Sun, 5 Mar 2023 16:02:27 +0000 (18:02 +0200)]
readme : add Roadmap section
Georgi Gerganov [Sun, 26 Feb 2023 19:10:50 +0000 (21:10 +0200)]
sync : latest whisper.cpp
Georgi Gerganov [Tue, 21 Feb 2023 20:16:56 +0000 (22:16 +0200)]
tests : fix cblas_sgemm call
Georgi Gerganov [Sat, 18 Feb 2023 14:05:31 +0000 (16:05 +0200)]
tests : add SVD experiments
Georgi Gerganov [Wed, 15 Feb 2023 18:59:36 +0000 (20:59 +0200)]
sync : latest whisper.cpp (scratch buffers in ggml)
Georgi Gerganov [Fri, 20 Jan 2023 06:45:45 +0000 (08:45 +0200)]
Update README.md
Takuya Takeuchi [Sun, 15 Jan 2023 14:30:13 +0000 (23:30 +0900)]
cmake : configure CMAKE_C_FLAGS and target_link_libraries for MSVC (#15)
Georgi Gerganov [Sun, 15 Jan 2023 13:53:08 +0000 (15:53 +0200)]
gpt : fix sampling to use the temperature (close #16)
Georgi Gerganov [Sun, 15 Jan 2023 13:09:36 +0000 (15:09 +0200)]
ggml : sync latest whisper.cpp
Georgi Gerganov [Sun, 8 Jan 2023 18:28:38 +0000 (20:28 +0200)]
gpt-2 : fix broken prompt due to recent experiments
No idea why I commited that!?
Georgi Gerganov [Sun, 8 Jan 2023 18:23:01 +0000 (20:23 +0200)]
ggml : sync latest whisper.cpp
Georgi Gerganov [Sat, 7 Jan 2023 19:05:33 +0000 (21:05 +0200)]
cmake : disable warnings about unused functions
Georgi Gerganov [Sat, 7 Jan 2023 19:04:24 +0000 (21:04 +0200)]
ggml : bugfix in new soft max computation
Georgi Gerganov [Sat, 7 Jan 2023 18:00:25 +0000 (20:00 +0200)]
tests : change test2 eps
Georgi Gerganov [Sat, 7 Jan 2023 17:53:05 +0000 (19:53 +0200)]
ggml : sync with latest whisper.cpp
Georgi Gerganov [Sat, 7 Jan 2023 10:17:34 +0000 (12:17 +0200)]
tests : some more quantization experiments
Georgi Gerganov [Sat, 7 Jan 2023 07:43:02 +0000 (09:43 +0200)]
sync : forgot to sync ggml.h
Georgi Gerganov [Sat, 7 Jan 2023 07:39:12 +0000 (09:39 +0200)]
sync : latest changes from whisper.cpp
Georgi Gerganov [Sat, 7 Jan 2023 07:36:32 +0000 (09:36 +0200)]
tests : wip quantized matrix multiplication method 2
Georgi Gerganov [Sat, 7 Jan 2023 07:31:42 +0000 (09:31 +0200)]
tests : minor fixes for x86
Georgi Gerganov [Thu, 5 Jan 2023 19:05:41 +0000 (21:05 +0200)]
tests : experiments with n-bit quantized matrix multiplication