]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)
authorAman Gupta <redacted>
Tue, 3 Feb 2026 15:31:23 +0000 (23:31 +0800)
committerGeorgi Gerganov <redacted>
Sat, 7 Feb 2026 08:37:38 +0000 (10:37 +0200)
commit0622f36a396d290967f48407f5dda31111798d03
tree50c177fb8993dded9aac2c05ab805544cc8259a8
parentf6f23e63cbcf8a759b44dc1aa6ad72f786171a32
CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)

* CUDA: use mmvq for mul-mat-id for small batch sizes

* add mmvq too

* Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs

* templatize multi_token_path
src/ggml-cuda/ggml-cuda.cu
src/ggml-cuda/mmvf.cu
src/ggml-cuda/mmvf.cuh
src/ggml-cuda/mmvq.cu