]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA full GPU acceleration, KV cache in VRAM (#1827)
authorJohannes Gäßler <redacted>
Wed, 14 Jun 2023 17:47:19 +0000 (19:47 +0200)
committerGitHub <redacted>
Wed, 14 Jun 2023 17:47:19 +0000 (19:47 +0200)
commit254a7a7a5ff4c874ff8488f1f5cbdd7e9c89d682
tree65f35a2d189f3cf6f1f625b2acb343c2dd77790d
parent92549202659fc23ba9fec5e688227d0da9b06b40
CUDA full GPU acceleration, KV cache in VRAM (#1827)

* Fixed CUDA RoPE

* ggml_cuda_mul_mat_vec_p021

* ggml_cuda_scale

* ggml_cuda_diag_mask_inf

* ggml_is_permuted

* ggml_cuda_cpy

* flatten rows for ggml_cuda_op

* Added a --low-vram option

* Fixed Windows performance

* Fixed LLAMA_CUDA_DMMV_Y > 1 for WizardLM
examples/common.cpp
examples/common.h
examples/main/README.md
examples/server/README.md
examples/server/server.cpp
ggml-cuda.cu
ggml-cuda.h
ggml.c
ggml.h
llama.cpp
llama.h