git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	Sigbjørn Skjæret <redacted>
	Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)
committer	Georgi Gerganov <redacted>
	Thu, 8 Aug 2024 10:45:29 +0000 (13:45 +0300)
commit	e59cf54176fff60343f0bf62691250cd9fdf928b
tree	f3caddf21e74d30b903cc518c1a7ef8602cfca0e	tree
parent	0a19c02501ca009b2c832e5f0c71cd3e3f86ccd0	commit \| diff

Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>

include/ggml.h		diff \| blob \| history
src/ggml-impl.h		diff \| blob \| history
src/ggml.c		diff \| blob \| history