]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
ggml-cuda: Add generic NVFP4 MMQ kernel (llama/21074)
authorMichael Wand <redacted>
Wed, 1 Apr 2026 10:04:58 +0000 (03:04 -0700)
committerGeorgi Gerganov <redacted>
Thu, 2 Apr 2026 07:25:32 +0000 (10:25 +0300)
commit2c5c5e103b8e2ab3e680bdc5de98852630d483b4
tree3101d3a07ce3a94916a8d85525fa552d718bd9af
parent50634c28837c24ac68b380b5750b41e701c87d73
ggml-cuda: Add generic NVFP4 MMQ kernel (llama/21074)

* Introduced NVFP4 generic MMQ kernel

* Added extra FP8 guard, hope to solve ci HIP failure

* Rename tiles and use HIP_FP8_AVAILABLE

* Removed remaning FP8 straggler and added const int

* Const

* Removed DECL_MMQ_CASE artifact

* Removed newline

* Removed space after else

* Changed HIP FP8 NVFP4 conversion gate

* Added new line to bottom of mmq.cu 270

* Removed extra spaces

* Removed single space in front of else on line 814

* Added NVFP4 to generate cu script so HIP can see it, further tightened logic

* Include generated mmq-instance-nvfp4.cu

* Added NVFP4 mmq to HIP Check ignore list

* Update ggml/src/ggml-cuda/mmq.cuh

Changed to Q3_K tile to read MMQ_MMA_TILE_X_K_NVFP4

Co-authored-by: Johannes Gäßler <redacted>
* Update ggml/src/ggml-cuda/mmq.cuh

Changed to Q3_K tile to read MMQ_MMA_TILE_X_K_NVFP4 in tile assert

Co-authored-by: Johannes Gäßler <redacted>
* Update ggml/src/ggml-cuda/mmq.cuh

Added function name ending for end if

Co-authored-by: Johannes Gäßler <redacted>
* Added function names to closing endif

Co-authored-by: Johannes Gäßler <redacted>
---------

Co-authored-by: Johannes Gäßler <redacted>
src/ggml-cuda/common.cuh
src/ggml-cuda/ggml-cuda.cu
src/ggml-cuda/mmq.cu
src/ggml-cuda/mmq.cuh
src/ggml-cuda/template-instances/generate_cu_files.py
src/ggml-cuda/template-instances/mmq-instance-nvfp4.cu [new file with mode: 0644]