]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)
authorSigbjørn Skjæret <redacted>
Fri, 2 Aug 2024 19:11:39 +0000 (21:11 +0200)
committerGeorgi Gerganov <redacted>
Thu, 8 Aug 2024 19:48:46 +0000 (22:48 +0300)
commit6cb38c36734fa884717df9787dc63c8a9c5c2025
tree2e0880edb8c194100ca1a29cebbfccf7ebd97066
parent9cf14ebcbc80ad16e9ab2a69fdad26ce48d9dc84
Fix conversion of unnormalized BF16->BF16 weights (llama/7843)

* add truncate_bf16

* truncate intermediate fp32 if converting bf16 to bf16

* fix masking in __compute_fp32_to_bf16

* np.int16 no longer used

* missing cast and additional numpy 2.x fix

* ggml-impl : do not flush bf16 subnormals to zero

* ggml : add reference fp32 to bf16 conversion

The fast version is no longer equivalent for all platforms
because of the handling of subnormal values.

* gguf-py : remove flush to zero for bf16 subnormals

* gguf-py : remove float32 truncation to bf16

Rounding achieves the same thing in the cases where this was used.

* missed prototype update in merge

* merge cleanup

---------

Co-authored-by: Francis Couture-Harpin <redacted>
ggml/include/ggml.h
ggml/src/ggml-impl.h
ggml/src/ggml.c