]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
cuBLAS: use host pinned memory and dequantize while copying (#1207)
authorslaren <redacted>
Sat, 29 Apr 2023 00:04:18 +0000 (02:04 +0200)
committerGitHub <redacted>
Sat, 29 Apr 2023 00:04:18 +0000 (02:04 +0200)
commit7fc50c051ae8a78e9643fdf172d12e20f2dd9b6c
treecc017db2f3443a39221ad319ab51df0925012e84
parentb1ee8f59b4101b46999a0995d9a34506f7285466
cuBLAS: use host pinned memory and dequantize while copying (#1207)

* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase
Makefile
ggml-cuda.cu
ggml-cuda.h
ggml.c
llama.cpp
llama_util.h