]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
cuBLAS: refactor and optimize f16 mat mul performance (#1259)
authorslaren <redacted>
Mon, 1 May 2023 16:11:07 +0000 (18:11 +0200)
committerGitHub <redacted>
Mon, 1 May 2023 16:11:07 +0000 (18:11 +0200)
commit58b367c2d757c0ea12aec672382462b42204c724
treeb2fa89daf71c08788c44e3fb9abf1747ec8ee65d
parentea3a0ad6b6b5ca4693b94acd4cb32e2803f66fae
cuBLAS: refactor and optimize f16 mat mul performance (#1259)

* cuBLAS: refactor, convert fp16 to fp32 on device

* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16

* fix build

* cuBLAS: update block_q5_1
ggml-cuda.cu
ggml-cuda.h
ggml.c
ggml.h