]>
git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml : 4-bit Integer quantisation + many llama.cpp improvements (#27)
* gq : attempt at n-bit quantization
* gq : add amax based method 3
* gq : progress on method 2
* gq : method 4 (AVX2)
* gq : method 4 (ARM)
* gq : method 4 (AVX2 attempt) + method 5 (no min)
* gq : method 5 (ARM)
* gpt-2 : model conversion for Q4_0 quantization
* ggml : Q4_0 quantization support (ggml_get_rows())
* gpt-2 : loading Q4_0 quantized model
* ggml : q4_0 quantization support
* ggml : q4_1 quantization support (seems to work for bigger models)
* gpt-2 : add gpt-2-quantize tool for quantizing f32 GPT-2 models
* ggml : 4-bit quantization works (only scalar for now)
* gq : add method 6 (ARM)
* ggml : vectorized mad q4_0 (ARM)
* ggml : vectorized quantize_row_q4_0 (ARM)
* ggml : simplify mad q4_0 (ARM)
* ggml : minor indentations
* gpt-j : support for 4-bit quantized model inference
* ggml : GGML_ASSERT() instead of assert() where appropriate
* gpt : avoid ggml_transpose on model tensors (new models!)
* gpt-2 : minor
* gpt-j : fix conversion for FP16 models (such as GPT-JT-6B)
* ggml : add ggml_compute_forward_rope_f16()
* gpt : fix memory usage computation
* ggml : fix ggml_is_contiguous() to take into account blck size
* whisper : add whisper-qunatize tool
* whisper : add support for quantized models
* whisper : mem usage based on model format type
* gpt : seems not worth to use FP16 for KV cache
* gpt : support quantisation of f16 models files
* ggml : fixes for rpi4
* whisper : add Q4_1 model sizes
* ggml : add WASM SIMD for Q4_0
* utils : print quantization histograms
* ggml : sync all changes from llama.cpp and whisper.cpp
* ggml : finalize the Q4_1 quantization for ARM_NEON
20 files changed: