git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Aman Gupta <redacted>
	Tue, 3 Feb 2026 15:31:23 +0000 (23:31 +0800)
committer	GitHub <redacted>
	Tue, 3 Feb 2026 15:31:23 +0000 (23:31 +0800)
commit	8bece2eb20f0134632ae229849fbde6559882d36
tree	88ee4a12467b368ad62526d378dc825e806b487c	tree
parent	a6fd8ca1fee621addff1695165414c4822fb08bf	commit \| diff

CUDA: use mmvq for mul-mat-id for small batch sizes (#18958)

* CUDA: use mmvq for mul-mat-id for small batch sizes

* add mmvq too

* Fix perf issue on ampere. Use mmvf mm-id only for non-nvidia GPUs

* templatize multi_token_path

ggml/src/ggml-cuda/ggml-cuda.cu		diff \| blob \| history
ggml/src/ggml-cuda/mmvf.cu		diff \| blob \| history
ggml/src/ggml-cuda/mmvf.cuh		diff \| blob \| history
ggml/src/ggml-cuda/mmvq.cu		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom