]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
cuda: add q8_0->f32 cpy operation (#9571)
authorIvan <redacted>
Tue, 24 Sep 2024 00:14:24 +0000 (03:14 +0300)
committerGitHub <redacted>
Tue, 24 Sep 2024 00:14:24 +0000 (02:14 +0200)
commit116efee0eef09d8c3c4c60b52fa01b56ddeb432c
treecb5f9f85e27749fdf6559580546f6f08bf991aa3
parent0b3bf966f47bf2ba88e5d4e3ed429602008c7e63
cuda: add q8_0->f32 cpy operation (#9571)

llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.
ggml/src/ggml-cuda.cu
ggml/src/ggml-cuda/cpy.cu
src/llama.cpp