]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: fix quantized KV cache + multiple sequences (llama/14822)
authorJohannes Gäßler <redacted>
Wed, 23 Jul 2025 10:35:53 +0000 (12:35 +0200)
committerGeorgi Gerganov <redacted>
Thu, 24 Jul 2025 17:57:40 +0000 (20:57 +0300)
commitbf82b8786519124d4d370a9176b82c14b58e22b1
tree0ac4ed5f9a12bcdb1f0fefe11178f34e3aefbe29
parent56c9cd2bab7f3c0befee70ad48672d0003fa6e91
CUDA: fix quantized KV cache + multiple sequences (llama/14822)

* CUDA: fix quantized KV cache + multiple sequences

* Update src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
src/ggml-cuda/convert.cu
src/ggml-cuda/fattn-common.cuh