]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (llama/15451)
authorQeeweew <redacted>
Mon, 25 Aug 2025 21:21:22 +0000 (05:21 +0800)
committerGeorgi Gerganov <redacted>
Fri, 5 Sep 2025 09:54:04 +0000 (12:54 +0300)
commit8e80c1d0aa9e2290681453f1460e896501eb1e22
tree4f20b130c69bb6e29e81124203df7c9cad2d9e52
parentd9f431b78f339e1950daf6b4908e50093b868fae
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (llama/15451)

* CUDA: optimize get_int_from_table_16

* CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs

* revise documentation

---------

Co-authored-by: xix <redacted>
Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/vecdotq.cuh