]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
authorJohannes Gäßler <redacted>
Fri, 14 Jun 2024 16:41:49 +0000 (18:41 +0200)
committerGitHub <redacted>
Fri, 14 Jun 2024 16:41:49 +0000 (18:41 +0200)
commit76d66ee0be91e2bec93206e821ee1db8d023cff5
tree9bf121667539f91b90b54b237e54bdbd9a16161c
parent66ef1ceedf983773c8ceb4d925285d41d4e50e2a
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)

* CUDA: faster q2_K, q3_K MMQ + int8 tensor cores

* try CI fix

* try CI fix

* try CI fix

* fix data race

* rever q2_K precision related changes
ggml-cuda.cu
ggml-cuda/argsort.cu
ggml-cuda/common.cuh
ggml-cuda/mmq.cuh
ggml-cuda/softmax.cu
ggml-cuda/vecdotq.cuh