git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	hipudding <redacted>
	Mon, 13 Oct 2025 00:52:22 +0000 (08:52 +0800)
committer	Georgi Gerganov <redacted>
	Tue, 14 Oct 2025 19:07:44 +0000 (22:07 +0300)
commit	17932113ebadfe580271b9ddfa98c0875657c123
tree	4c86ce4539bb929d639712cbbbd60a33fd4b8ca1	tree
parent	ce6e766c4cd29c48d246c7828f4245841d69d10b	commit \| diff

CANN: Update several operators to support FP16 data format (llama/16251)

Many Ascend operators internally use FP16 precision for computation.
If input data is in FP32, it must first be cast to FP16 before
computation, and then cast back to FP32 after computation, which
introduces unnecessary cast operations. Moreover, FP16 computation
requires significantly less workload compared to FP32, leading to
noticeable efficiency improvements.

In this change, `get_rows`, `rms_norm`, and `flash_attn_ext` are extended
to support multiple data types. Validation on the Qwen2 0.5b model shows
correct accuracy and about 10% performance gain in concurrent scenarios.

Co-authored-by: noemotiovon <redacted>