]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
cuda: add q8_0->f32 cpy operation (llama/9571)
authorIvan <redacted>
Tue, 24 Sep 2024 00:14:24 +0000 (03:14 +0300)
committerGeorgi Gerganov <redacted>
Tue, 24 Sep 2024 16:45:08 +0000 (19:45 +0300)
commit2fc1d20f9ee2e66c199ec104e73e8c3dd3e57312
treefc4745ed57f7cbac3a823ec28770b9eb1ee6e8ee
parent08e8414f277a1a559d52dd5a474f777353ec61fc
cuda: add q8_0->f32 cpy operation (llama/9571)

llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.
ggml/src/ggml-cuda.cu
ggml/src/ggml-cuda/cpy.cu