]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: fix quantized KV cache + multiple sequences (#14822)
authorJohannes Gäßler <redacted>
Wed, 23 Jul 2025 10:35:53 +0000 (12:35 +0200)
committerGeorgi Gerganov <redacted>
Wed, 23 Jul 2025 11:08:09 +0000 (14:08 +0300)
commit07a19e27a26f76d34be62da53807f93131fb3cab
treed614f8f83dc23d463d46517b04594a11c25e2097
parent18f3b5ff9e5eda4e7d04bceff8ffdccb0a696ed8
CUDA: fix quantized KV cache + multiple sequences (#14822)

* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
ggml/src/ggml-cuda/convert.cu
ggml/src/ggml-cuda/fattn-common.cuh