]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: stream-k decomposition for MMQ (#8018)
authorJohannes Gäßler <redacted>
Thu, 20 Jun 2024 12:39:21 +0000 (14:39 +0200)
committerGitHub <redacted>
Thu, 20 Jun 2024 12:39:21 +0000 (14:39 +0200)
commitd50f8897a797a5a03f31228d1b5a7b8130ee1bc2
tree9ee91b29378e35ff8f7b5071308c12d429f316f0
parent2075a66a96cc1b04eabec7cf4b3051193d6f719e
CUDA: stream-k decomposition for MMQ (#8018)

* CUDA: stream-k decomposition for MMQ

* fix undefined memory reads for small matrices
ggml-cuda.cu
ggml-cuda/common.cuh
ggml-cuda/mmq.cu
ggml-cuda/mmq.cuh