]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CANN: Update several operators to support FP16 data format (llama/16251)
authorhipudding <redacted>
Mon, 13 Oct 2025 00:52:22 +0000 (08:52 +0800)
committerGeorgi Gerganov <redacted>
Wed, 15 Oct 2025 06:29:17 +0000 (09:29 +0300)
commit53e21364a6950a7193a211d9b53331e626c3fe78
tree43710007e7754495120e311f2915078b5f2d1e6b
parent7f22fe5d8fe3b821f0f329bd786d3daa0a0f0181
CANN: Update several operators to support FP16 data format (llama/16251)

Many Ascend operators internally use FP16 precision for computation.
If input data is in FP32, it must first be cast to FP16 before
computation, and then cast back to FP32 after computation, which
introduces unnecessary cast operations. Moreover, FP16 computation
requires significantly less workload compared to FP32, leading to
noticeable efficiency improvements.

In this change, `get_rows`, `rms_norm`, and `flash_attn_ext` are extended
to support multiple data types. Validation on the Qwen2 0.5b model shows
correct accuracy and about 10% performance gain in concurrent scenarios.

Co-authored-by: noemotiovon <redacted>
ggml/src/ggml-cann/aclnn_ops.cpp