]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: fix quantized KV cache + multiple sequences (llama/14822)
authorJohannes Gäßler <redacted>
Wed, 23 Jul 2025 10:35:53 +0000 (12:35 +0200)
committerGeorgi Gerganov <redacted>
Mon, 28 Jul 2025 10:02:32 +0000 (13:02 +0300)
commita65976fc3cb3359d02a374aaaa7fb6855f2a7dbf
tree4f4fb8d7d1cfbdacefcacb8fa5c5ce89f0b15701
parent026d8a0c6e2da9f5f9079f7e99dd1df086715eb7
CUDA: fix quantized KV cache + multiple sequences (llama/14822)

* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
ggml/src/ggml-cuda/convert.cu
ggml/src/ggml-cuda/fattn-common.cuh