]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)
authorQeeweew <redacted>
Mon, 25 Aug 2025 21:21:22 +0000 (05:21 +0800)
committerGitHub <redacted>
Mon, 25 Aug 2025 21:21:22 +0000 (23:21 +0200)
commit74f52f77f28a5ad6d6075231afcb8d1ad763ca32
tree2464897542358852e4ef773982e26aa2c6161440
parentf7207b0415986dd7f48447149da7de3a82338276
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)

* CUDA: optimize get_int_from_table_16

* CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs

* revise documentation

---------

Co-authored-by: xix <redacted>
Co-authored-by: Johannes Gäßler <redacted>
ggml/src/ggml-cuda/vecdotq.cuh