]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
cuda: add q8_0->f32 cpy operation (llama/9571)
authorIvan <redacted>
Tue, 24 Sep 2024 00:14:24 +0000 (03:14 +0300)
committerGeorgi Gerganov <redacted>
Tue, 24 Sep 2024 10:04:37 +0000 (13:04 +0300)
commitc5d592357a334d78c7f8e1f446120d6c1c559fc7
treec0e7e533d4a6af828b0c0a14864977d4596a89c8
parentd2309ecd3a853c97f2aee65319e3fd0a80f07676
cuda: add q8_0->f32 cpy operation (llama/9571)

llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.
src/ggml-cuda.cu
src/ggml-cuda/cpy.cu