git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	compilade <redacted>
	Sat, 11 May 2024 15:06:26 +0000 (11:06 -0400)
committer	GitHub <redacted>
	Sat, 11 May 2024 15:06:26 +0000 (11:06 -0400)
commit	5a419926b0c4efab0531401aea91522aaea9fd07
tree	fc04fa59a6588650a6fed70fedd8c1d4b39ec1d1	tree
parent	fae9d234b6606693704eca62fe4aefbb6c6abb45	commit \| diff

convert-hf : support bfloat16 conversion (#7158)

* convert-hf : support bfloat16 conversion

* gguf-py : flake8 fixes

* convert-hf : add missing space after comma

* convert-hf : get bit-exact same output as ./quantize

The quantization version was missing.

* convert-hf : don't round bf16 NANs

* convert-hf : save some memory with np.int16 intermediate bf16 weights

* convert-hf : more closely match llama.cpp with which weights to keep in f32

* convert-hf : add --outtype auto-f16

A reason for this to exist is for model quantizers who want an initial
GGUF with the most fidelity to the original model while still using
a 16-bit float type instead of 32-bit floats.

* convert-hf : remove a semicolon because flake8 doesn't like it

It's a reflex from when programming in C/C++, I guess.

* convert-hf : support outtype templating in outfile name

* convert-hf : rename --outtype auto-f16 to --outtype auto

convert-hf-to-gguf.py		diff \| blob \| history
gguf-py/gguf/__init__.py		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/gguf_writer.py		diff \| blob \| history
gguf-py/gguf/lazy.py	[new file with mode: 0644]	blob