]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (llama/15451)
authorQeeweew <redacted>
Mon, 25 Aug 2025 21:21:22 +0000 (05:21 +0800)
committerGeorgi Gerganov <redacted>
Sat, 20 Sep 2025 10:42:41 +0000 (13:42 +0300)
commit2468074e914e5fb0f66c26374c812f908d069fda
treefa9691db7ffa70292194f3310784139d14c0917b
parent582ef379ab3c7f02305fd3ddf914ba68052c301d
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (llama/15451)

* CUDA: optimize get_int_from_table_16

* CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs

* revise documentation

---------

Co-authored-by: xix <redacted>
Co-authored-by: Johannes Gäßler <redacted>
ggml/src/ggml-cuda/vecdotq.cuh