]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml-cuda: Add NVFP4 dp4a kernel (llama/20644)
authorMichael Wand <redacted>
Thu, 26 Mar 2026 08:54:03 +0000 (01:54 -0700)
committerGeorgi Gerganov <redacted>
Sat, 28 Mar 2026 11:39:09 +0000 (13:39 +0200)
commitc2c7a83d8402bf5032b53f7e97945cc3370764af
treee0de705370f4c5538b218cb46b6d5abe836f441f
parent83d1f963f395a31aecf2da8265b0a5d1d8cbcb85
ggml-cuda: Add NVFP4 dp4a kernel (llama/20644)

Added check for dst_t to cuda_cast template for float
Restored ggml_cuda_ue4m3_to_fp32, changed vecdot ints to int32ts
Added CUDART/HIP Check and HIP/fp8 include
Added NVFP4 to Test-backend-ops
Added hip_fp8_e4m3 to __nv_fp8_e4m3 typedef

---------

Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/common.cuh
src/ggml-cuda/convert.cu
src/ggml-cuda/ggml-cuda.cu
src/ggml-cuda/mmvq.cu
src/ggml-cuda/vecdotq.cuh
src/ggml-cuda/vendors/cuda.h
src/ggml-cuda/vendors/hip.h
tests/test-backend-ops.cpp