git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit

author	Aman Gupta <redacted>
	Wed, 24 Dec 2025 14:28:26 +0000 (22:28 +0800)
committer	Georgi Gerganov <redacted>
	Wed, 31 Dec 2025 15:52:09 +0000 (17:52 +0200)
commit	41e578ec8a46ee0b4c7aa908a534ba540b587091
tree	552559fb3e62d13c3dcdf650b9cea97d6e5b1597	tree
parent	f863735caacaa4d03422de0b9b995ca83ca98eb9	commit \| diff

CUDA: experimental native mxfp4 support for blackwell (llama/17906)

* CUDA: experimental native mxfp4 support for blackwell

* optimize load_tiles

* optimize quantize_mxfp4

* cleanup

* first pass review: formatting

* use interleaved layout for mma

* mmq: add assert for size

* use __nv_fp4x4_e2m1

* use iter_k as 512, cleanup

* Use 1200 as blackwell instead of 1000

* address review comments

* mmq: fix stride

* quantize.cu: use reference impl of e8m0 scale

* address review comments

* add 120f-virtual + minor fixes

---------

Co-authored-by: Aman Gupta <aman>

ggml/src/ggml-cuda/CMakeLists.txt		diff \| blob \| history
ggml/src/ggml-cuda/common.cuh		diff \| blob \| history
ggml/src/ggml-cuda/mma.cuh		diff \| blob \| history
ggml/src/ggml-cuda/mmq.cu		diff \| blob \| history
ggml/src/ggml-cuda/mmq.cuh		diff \| blob \| history
ggml/src/ggml-cuda/quantize.cu		diff \| blob \| history
ggml/src/ggml-cuda/quantize.cuh		diff \| blob \| history
ggml/src/ggml-cuda/vendors/cuda.h		diff \| blob \| history