]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: Quantized matrix matrix multiplication (#2160)
authorJohannes Gäßler <redacted>
Sat, 29 Jul 2023 21:04:44 +0000 (23:04 +0200)
committerGitHub <redacted>
Sat, 29 Jul 2023 21:04:44 +0000 (23:04 +0200)
commit11f3ca06b8c66b0427aab0a472479da22553b472
tree8e934ff0d93a78447d996b00561f7ff826c3533f
parent9baf9ef304f330009d5a93b7390280a0fd27c9a1
CUDA: Quantized matrix matrix multiplication (#2160)

* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds
CMakeLists.txt
Makefile
README.md
ggml-cuda.cu