]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)
authorAman Gupta <redacted>
Tue, 3 Feb 2026 15:31:23 +0000 (23:31 +0800)
committerGitHub <redacted>
Tue, 3 Feb 2026 15:31:23 +0000 (23:31 +0800)
commit8bece2eb20f0134632ae229849fbde6559882d36
tree88ee4a12467b368ad62526d378dc825e806b487c
parenta6fd8ca1fee621addff1695165414c4822fb08bf
CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)

* CUDA: use mmvq for mul-mat-id for small batch sizes

* add mmvq too

* Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs

* templatize multi_token_path
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/mmvf.cu
ggml/src/ggml-cuda/mmvf.cuh
ggml/src/ggml-cuda/mmvq.cu