git.djapps.eu Git - pkg/ggml/sources/ggml/commit

ggml : 4-bit Integer quantisation + many llama.cpp improvements (#27)

* gq : attempt at n-bit quantization

* gq : add amax based method 3

* gq : progress on method 2

* gq : method 4 (AVX2)

* gq : method 4 (ARM)

* gq : method 4 (AVX2 attempt) + method 5 (no min)

* gq : method 5 (ARM)

* gpt-2 : model conversion for Q4_0 quantization

* ggml : Q4_0 quantization support (ggml_get_rows())

* gpt-2 : loading Q4_0 quantized model

* ggml : q4_0 quantization support

* ggml : q4_1 quantization support (seems to work for bigger models)

* gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models

* ggml : 4-bit quantization works (only scalar for now)

* gq : add method 6 (ARM)

* ggml : vectorized mad q4_0 (ARM)

* ggml : vectorized quantize_row_q4_0 (ARM)

* ggml : simplify mad q4_0 (ARM)

* ggml : minor indentations

* gpt-j : support for 4-bit quantized model inference

* ggml : GGML_ASSERT() instead of assert() where appropriate

* gpt : avoid ggml_transpose on model tensors (new models!)

* gpt-2 : minor

* gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)

* ggml : add ggml_compute_forward_rope_f16()

* gpt : fix memory usage computation

* ggml : fix ggml_is_contiguous() to take into account blck size

* whisper : add whisper-qunatize tool

* whisper : add support for quantized models

* whisper : mem usage based on model format type

* gpt : seems not worth to use FP16 for KV cache

* gpt : support quantisation of f16 models files

* ggml : fixes for rpi4

* whisper : add Q4_1 model sizes

* ggml : add WASM SIMD for Q4_0

* utils : print quantization histograms

* ggml : sync all changes from llama.cpp and whisper.cpp

* ggml : finalize the Q4_1 quantization for ARM_NEON

author	Georgi Gerganov <redacted>
	Wed, 29 Mar 2023 19:21:36 +0000 (22:21 +0300)
committer	GitHub <redacted>
	Wed, 29 Mar 2023 19:21:36 +0000 (22:21 +0300)
commit	acd4aeee95cc47f06ab007962453b40e03af8e88
tree	305a25b834ee73402144c0f8a6f698b996ed87b7	tree
parent	c55f50a18e401c612a7b7abd513aaf41935e8af9	commit \| diff

CMakeLists.txt		diff \| blob \| history
examples/gpt-2/CMakeLists.txt		diff \| blob \| history
examples/gpt-2/README.md		diff \| blob \| history
examples/gpt-2/convert-ckpt-to-ggml.py		diff \| blob \| history
examples/gpt-2/main.cpp		diff \| blob \| history
examples/gpt-2/quantize.cpp	[new file with mode: 0644]	blob
examples/gpt-j/CMakeLists.txt		diff \| blob \| history
examples/gpt-j/convert-h5-to-ggml.py		diff \| blob \| history
examples/gpt-j/main.cpp		diff \| blob \| history
examples/gpt-j/quantize.cpp	[new file with mode: 0644]	blob
examples/utils.h		diff \| blob \| history
examples/whisper/CMakeLists.txt		diff \| blob \| history
examples/whisper/convert-pt-to-ggml.py		diff \| blob \| history
examples/whisper/main.cpp		diff \| blob \| history
examples/whisper/quantize.cpp	[new file with mode: 0644]	blob
examples/whisper/whisper.cpp		diff \| blob \| history
examples/whisper/whisper.h		diff \| blob \| history
include/ggml/ggml.h		diff \| blob \| history
src/ggml.c		diff \| blob \| history
tests/test-mul-mat2.c		diff \| blob \| history