]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)
authorAman Gupta <redacted>
Tue, 3 Feb 2026 15:31:23 +0000 (23:31 +0800)
committerGeorgi Gerganov <redacted>
Sun, 8 Feb 2026 07:29:10 +0000 (09:29 +0200)
commit8eede801e3d1799a12f969bba044aaffe59bace7
treeb41f7ce8281756605c7aa1b52cc4302746cbbd3d
parentce8a2da62004f47522839f1719907f785262e684
CUDA: use mmvq for mul-mat-id for small batch sizes (llama/18958)

* CUDA: use mmvq for mul-mat-id for small batch sizes

* add mmvq too

* Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs

* templatize multi_token_path
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/mmvf.cu
ggml/src/ggml-cuda/mmvf.cuh
ggml/src/ggml-cuda/mmvq.cu