]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
Some more Q4_K and Q5_K speedup on CUDA (#2346)
authorKawrakow <redacted>
Sun, 23 Jul 2023 21:19:47 +0000 (00:19 +0300)
committerGitHub <redacted>
Sun, 23 Jul 2023 21:19:47 +0000 (00:19 +0300)
commit2f9cf974a066ac0e03fbb235d834b01b0164d743
tree1c0c1b42ef5d1f8013d9641d778225e98b59d134
parent4f06592cc6b83979e4b442e8cb97b3948c857188
Some more Q4_K and Q5_K speedup on CUDA (#2346)

* Faster Q5_K on CUDA

* Small Q5_K improvement on older GPUs

* Spped up Q4_K on CUDA

GTX1660: 29.5 ms/t -> 25.6 ms/t
RTX4080: 8.40 ms/t -> 8.25 ms/t

* Spped up Q4_K on CUDA

GTX1660: 36.7 ms/t -> 35.6 ms/t
RTX4080:  9.8 ms/t ->  9.5 ms/t

* Address PR comments

* Add some comments to satisfy PR reviewer

---------

Co-authored-by: Iwan Kawrakow <redacted>
ggml-cuda.cu