]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)
authorSigbjørn Skjæret <redacted>
Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)
committerGeorgi Gerganov <redacted>
Thu, 8 Aug 2024 10:45:29 +0000 (13:45 +0300)
commite59cf54176fff60343f0bf62691250cd9fdf928b
treef3caddf21e74d30b903cc518c1a7ef8602cfca0e
parent0a19c02501ca009b2c832e5f0c71cd3e3f86ccd0
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>
include/ggml.h
src/ggml-impl.h
src/ggml.c