]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)
authorJohannes Gäßler <redacted>
Fri, 14 Jun 2024 16:41:49 +0000 (18:41 +0200)
committerGeorgi Gerganov <redacted>
Sun, 16 Jun 2024 15:19:48 +0000 (18:19 +0300)
commitb17ba2815b210dab8c610a20377e25f8254c5d41
treef54f326e4905ac0e678d6741b850ad738b7a8ff2
parent7a489af2f3c9eed983f6ba301db604f7dacee709
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (llama/7921)

* CUDA: faster q2_K, q3_K MMQ + int8 tensor cores

* try CI fix

* try CI fix

* try CI fix

* fix data race

* rever q2_K precision related changes
ggml-cuda.cu
ggml-cuda/argsort.cu
ggml-cuda/common.cuh
ggml-cuda/mmq.cuh
ggml-cuda/softmax.cu
ggml-cuda/vecdotq.cuh